Pipeline Objects#

The pipeline classes Pipeline and ModelPipeline are made up of base pipeline objects / pieces. These are based on the scikit learn concept of a Pipeline.

Across all base ModelPipeline pieces, e.g., Model or Scaler, there exists an obj param when initializing these objects. Beyond choice of obj parameter, pipeline objects share a number of other common parameters which allow for extra customization, as well as some objects which have unique parameters. Shared parameters include params, scope, extra_params> and cache_loc.

obj#

The ‘obj’ parameter is the core parameter for any pipeline object. It can broadly refer to either a str, which indicates a valid pre-defined custom obj for that piece, or depending on the pieces, this parameter can be passed a custom object directly.

For example if we want to make an instance of RobustScaler from sklearn:

In [1]: import BPt as bp

In [2]: scaler = bp.Scaler('robust', scope='all')

In [3]: scaler
Out[3]: Scaler(obj='robust', scope='all')

# See what this object looks like internally
In [4]: scaler.build()
Out[4]: 
(('robust',
  ScopeTransformer(estimator=RobustScaler(quantile_range=(5, 95)), inds='all')),
 {})

We can do this because ‘robust’ exists as a default option already available in BPt (See Scalers). That said, if it wasn’t we could pass it as a custom object as well.

In [5]: from sklearn.preprocessing import RobustScaler

In [6]: scaler = bp.Scaler(RobustScaler())

In [7]: scaler
Out[7]: Scaler(obj=RobustScaler())

params#

params are used to either specify or select from a default existing choice of associated fixed or distribution of hyper-parameter values for this object. For example, we can choose to associate an existing hyper-parameter distribution for the robust scaler from before with:

In [8]: scaler = bp.Scaler('robust', params="robust gs")

In [9]: scaler
Out[9]: Scaler(obj='robust', params='robust gs')

We could also set it to a custom distribution using ref:Parameter<api.dists>:

In [10]: quantile_range = bp.p.Choice([(5, 95), (10, 90), (15, 85)])

In [11]: scaler = bp.Scaler('robust', params={'quantile_range': quantile_range})

In [12]: scaler
Out[12]: 
Scaler(obj='robust',
       params={'quantile_range': Choice([(5, 95), (10, 90), (15, 85)])})

See Params for more information on how to set Parameters.

scope#

Pipeline objects also have the argument Scope as an input argument. This argument allows for pipeline objects to work on just a subset of columns. Notably, if the scope in reference to a specific Dataset is empty, then the piopeline object in question will just be silently skipped. This is useful for defining generic pipelines to different types of datasets.