BPt.Pipeline#
- class BPt.Pipeline(steps, param_search=None, cache_loc=None, verbose=False)[source]#
This class is used to create flexible BPt style pipeline’s.
See
ModelPipeline
for an alternate version of this class which enforces a strict ordering on how pipeline pieces can be set, and also includes a number of useful default behaviors.- Parameters
- stepslist of Pipeline Objects
- Input here is a list of Pipeline Objects or custom valid sklearn-compliant objects / pipeline steps. These can be in any order as long as there is only one
Model
orEnsemble
and it is at the end of the list, i.e., the last step. This constraint excludes any nested models.See below for example usage. - param_search
ParamSearch
, None, optional - This parameter optionally specifies that this object should be nested with a hyper-parameter search.If passed an instance of
ParamSearch
, the underlying object, or components of the underlying object (if a pipeline) must have atleast one valid hyper-parameter distribution to search over.If left as None, the default, then no hyper-parameter search will be performed.default = None
- cache_locPath str or None, optional
Optional parameter specifying a directory in which full BPt pipeline’s should be cached after fitting. This should be either left as None, or passed a str representing a directory in which cached fitted pipeline should be saved.
default = None
- verbosebool, optional
If True, then print statements about the current progress of the pipeline during fitting.
Note: If in a multi-processed context, where pipelines are being fit on different threads, verbose output may be messy (i.e., overlapping messages from different threads).
default = 0
Notes
This class differs from
sklearn.pipeline.Pipeline
most drastically in that this class is not itself directly a sklearn-complaint estimator (i.e., an object with fit and predict methods). Instead, this object represents a flexible set of input pieces, that can all vary depending on the eventualDataset
andProblemSpec
they are used in the context of. This means that instances of this class can be easily re-used across different data and setups, for example with different underlying problem_types (running a binary and then a regression version).Examples
The base behavior is to use all valid Pipeline objects, for example:
In [1]: pipe = bp.Pipeline(steps=[bp.Imputer('mean'), ...: bp.Scaler('robust'), ...: bp.Model('elastic')]) ...: In [2]: pipe Out[2]: Pipeline(steps=[Imputer(obj='mean'), Scaler(obj='robust'), Model(obj='elastic')])
This would creates a pipeline with mean imputation, robust scaling and an elastic net, all using the BPt style custom objects.This object can also work withsklearn.pipeline.Pipeline
style steps. Or a mix of BPt style and sklearn style, for example:In [3]: from sklearn.linear_model import Ridge In [4]: pipe = bp.Pipeline(steps=[bp.Imputer('mean'), ...: bp.Scaler('robust'), ...: ('ridge regression', Ridge())]) ...: In [5]: pipe Out[5]: Pipeline(steps=[Imputer(obj='mean'), Scaler(obj='robust'), Custom(step=('ridge regression', Ridge()))])
You may also pass sklearn objects directly instead of as a tuple, i.e., in the
sklearn.pipeline.make_pipeline()
input style. For example:In [6]: from sklearn.linear_model import Ridge In [7]: pipe = bp.Pipeline(steps=[Ridge()]) In [8]: pipe Out[8]: Pipeline(steps=[Custom(step=Ridge())])
Note
Passing objects as sklearn-style ensures they have essentially a scope of ‘all’ and no associated hyper-parameter distributions.
Methods
build
([dataset, problem_spec])This method generates a sklearn compliant estimator version of the current
Pipeline
with respect to a passed dataset andDataset
andProblemSpec
.copy
()This method returns a deepcopy of the base object.
get_params
([deep])Get parameters for this estimator.
set_params
(**params)Set the parameters of this estimator.