BPt.ModelPipeline#
- class BPt.ModelPipeline(loaders=None, imputers='default', scalers='default', transformers=None, feat_selectors=None, model='default', param_search=None, cache_loc=None, verbose=False)[source]#
The ModelPipeline class is used to create BPtPipeline’s. The ModelPipeline differs from
Pipeline
in that it enforces a simplification on the ordering of pieces, representing the typical order in which they might appear. SeePipeline
for a more flexible version of this class which does not enforce any ordering.The order enforced, which follows the order of the input arguments, is:
loaders,
imputers
scalers
transformers
feat_selectors
model
For each parameter, which the exception of model, you may pass either one instance of that piece or a list of that piece. In the case that a list is passed, then it will be treated as a sequential set of steps / transformation where the output from each element of the list if passed on to the next as input.
- Parameters
- loaders
Loader
or list of, optional Each
Loader
refers to transformations which operate on loaded Data_Files SeeLoader
.You may wish to consider using the
Pipe
input class when creating a singleLoader
objPassed loaders can also be wrapped in a
Select
wrapper, e.g., as either# Just passing select loaders = Select([Loader(...), Loader(...)]) # Or nested loaders = [Loader(...), Select([Loader(...), Loader(...)])]
In this way, most of the pipeline objects can accept lists, or nested lists with param wrapped, not just loaders!
default = None
- imputers
Imputer
, list of or None, optional If there is any missing data (NaN’s) that have been kept within the input data, then an imputation strategy should likely be defined. This param controls what kind of imputation strategy to use.
See
Imputer
.You may also pass a value of ‘default’ here, which in the case that any NaN data is present within the training set, the set of two imputers:
'default' == [Imputer('mean', scope='float'), Imputer('median', scope='category')]
Will be used. Otherwise, if there is no NaN present in the input data, no Imputer will be used.
default = 'default'
- scalers
Scaler
, list of or None, optional Each
Scaler
refers to any potential data scaling where a transformation on the data (without access to the target variable) is computed, and the number of features or data points does not change.See
Scaler
.By keeping the default behavior, with parameter, ‘default’, standard scaling is applied, which sets each feature to have a std of 1 and mean 0.
default = 'default'
- transformers
Transformer
, list of or None, optional Each
Transformer
defines a type of transformation to the data that changes the number of features in perhaps non-deterministic or not simply removal (i.e., different from feat_selectors), for example applying a PCA, where both the number of features change, but also the new features do not 1:1 correspond to the original features. SeeTransformer
for more information.default = None
- feat_selectors
FeatSelector
, list of or None, optional Each
FeatSelector
refers to an optional feature selection stage of the Pipeline.See
FeatSelector
.default = None
- model
Model
,Ensemble
, optional This parameter accepts one input of type
Model
orEnsemble
. Though, while it cannot accept a list (i.e., no sequential behavior), you may still pass Input Type wrapper likeSelect
to perform model selection via param search.See
Model
for more information on how to specify a single model to BPt, andEnsemble
for information on how to build an ensemble of models.This parameter cannot be None. In the default case of passing ‘default’, a ridge regression is used.
default = 'default'
- param_search
ParamSearch
, None, optional - This parameter optionally specifies that this object should be nested with a hyper-parameter search.If passed an instance of
ParamSearch
, the underlying object, or components of the underlying object (if a pipeline) must have atleast one valid hyper-parameter distribution to search over.If left as None, the default, then no hyper-parameter search will be performed.default = None
- cache_locPath str or None, optional
Optional parameter specifying a directory in which full BPt pipeline’s should be cached after fitting. This should be either left as None, or passed a str representing a directory in which cached fitted pipeline should be saved.
default = None
- verbosebool, optional
If True, then print statements about the current progress of the pipeline during fitting.
Note: If in a multi-processed context, where pipelines are being fit on different threads, verbose output may be messy (i.e., overlapping messages from different threads).
default = 0
- loaders
Attributes
ModelPipeline has a fixed order in which pieces are constructed, this is:
Methods
build
([dataset, problem_spec])This method generates a sklearn compliant estimator version of the current
Pipeline
with respect to a passed dataset andDataset
andProblemSpec
.copy
()This method returns a deepcopy of the base object.
get_params
([deep])Get parameters for this estimator.
print_all
([_print])This method can be used to print a formatted representation of this object.
set_params
(**params)Set the parameters of this estimator.