BPt.Pipeline#

class BPt.Pipeline(steps, param_search=None, cache_loc=None, verbose=False)[source]#

This class is used to create flexible BPt style pipeline’s.

See ModelPipeline for an alternate version of this class which enforces a strict ordering on how pipeline pieces can be set, and also includes a number of useful default behaviors.

Parameters

stepslist of Pipeline Objects

Input here is a list of Pipeline Objects or custom valid sklearn-compliant objects / pipeline steps. These can be in any order as long as there is only one Model or Ensemble and it is at the end of the list, i.e., the last step. This constraint excludes any nested models.

See below for example usage.

param_searchParamSearch, None, optional

This parameter optionally specifies that this object should be nested with a hyper-parameter search.

If passed an instance of ParamSearch, the underlying object, or components of the underlying object (if a pipeline) must have atleast one valid hyper-parameter distribution to search over.

If left as None, the default, then no hyper-parameter search will be performed.

default = None

cache_locPath str or None, optional

Optional parameter specifying a directory in which full BPt pipeline’s should be cached after fitting. This should be either left as None, or passed a str representing a directory in which cached fitted pipeline should be saved.

default = None

verbosebool, optional

If True, then print statements about the current progress of the pipeline during fitting.

Note: If in a multi-processed context, where pipelines are being fit on different threads, verbose output may be messy (i.e., overlapping messages from different threads).

default = 0

Notes

This class differs from sklearn.pipeline.Pipeline most drastically in that this class is not itself directly a sklearn-complaint estimator (i.e., an object with fit and predict methods). Instead, this object represents a flexible set of input pieces, that can all vary depending on the eventual Dataset and ProblemSpec they are used in the context of. This means that instances of this class can be easily re-used across different data and setups, for example with different underlying problem_types (running a binary and then a regression version).

Examples

The base behavior is to use all valid Pipeline objects, for example:

In [1]: pipe = bp.Pipeline(steps=[bp.Imputer('mean'),
   ...:                           bp.Scaler('robust'),
   ...:                           bp.Model('elastic')])
   ...: 

In [2]: pipe
Out[2]: 
Pipeline(steps=[Imputer(obj='mean'), Scaler(obj='robust'),
                Model(obj='elastic')])

This would creates a pipeline with mean imputation, robust scaling and an elastic net, all using the BPt style custom objects.

This object can also work with sklearn.pipeline.Pipeline style steps. Or a mix of BPt style and sklearn style, for example:

In [3]: from sklearn.linear_model import Ridge

In [4]: pipe = bp.Pipeline(steps=[bp.Imputer('mean'),
   ...:                           bp.Scaler('robust'),
   ...:                           ('ridge regression', Ridge())])
   ...: 

In [5]: pipe
Out[5]: 
Pipeline(steps=[Imputer(obj='mean'), Scaler(obj='robust'),
                Custom(step=('ridge regression', Ridge()))])

You may also pass sklearn objects directly instead of as a tuple, i.e., in the sklearn.pipeline.make_pipeline() input style. For example:

In [6]: from sklearn.linear_model import Ridge

In [7]: pipe = bp.Pipeline(steps=[Ridge()])

In [8]: pipe
Out[8]: Pipeline(steps=[Custom(step=Ridge())])

Note

Passing objects as sklearn-style ensures they have essentially a scope of ‘all’ and no associated hyper-parameter distributions.

Methods

`build`([dataset, problem_spec])	This method generates a sklearn compliant estimator version of the current `Pipeline` with respect to a passed dataset and `Dataset` and `ProblemSpec`.
`copy`()	This method returns a deepcopy of the base object.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.