Pipeline Objects#
The pipeline classes Pipeline
and ModelPipeline
are made up of base pipeline objects / pieces. These are based on the
scikit learn concept of a Pipeline
.
Across all base ModelPipeline
pieces,
e.g., Model
or Scaler
,
there exists an obj param when initializing these objects.
Beyond choice of obj parameter, pipeline objects share a number of other common
parameters which allow for extra customization, as well as some objects which have unique
parameters. Shared parameters include params, scope, extra_params> and cache_loc.
obj#
The ‘obj’ parameter is the core parameter for any pipeline object. It can broadly refer to either a str, which indicates a valid pre-defined custom obj for that piece, or depending on the pieces, this parameter can be passed a custom object directly.
For example if we want to make an
instance of RobustScaler
from sklearn:
In [1]: import BPt as bp
In [2]: scaler = bp.Scaler('robust', scope='all')
In [3]: scaler
Out[3]: Scaler(obj='robust', scope='all')
# See what this object looks like internally
In [4]: scaler.build()
Out[4]:
(('robust',
ScopeTransformer(estimator=RobustScaler(quantile_range=(5, 95)), inds='all')),
{})
We can do this because ‘robust’ exists as a default option already available in BPt (See Scalers). That said, if it wasn’t we could pass it as a custom object as well.
In [5]: from sklearn.preprocessing import RobustScaler
In [6]: scaler = bp.Scaler(RobustScaler())
In [7]: scaler
Out[7]: Scaler(obj=RobustScaler())
params#
params are used to either specify or select from a default existing choice of associated fixed or distribution of hyper-parameter values for this object. For example, we can choose to associate an existing hyper-parameter distribution for the robust scaler from before with:
In [8]: scaler = bp.Scaler('robust', params="robust gs")
In [9]: scaler
Out[9]: Scaler(obj='robust', params='robust gs')
We could also set it to a custom distribution using ref:Parameter<api.dists>:
In [10]: quantile_range = bp.p.Choice([(5, 95), (10, 90), (15, 85)])
In [11]: scaler = bp.Scaler('robust', params={'quantile_range': quantile_range})
In [12]: scaler
Out[12]:
Scaler(obj='robust',
params={'quantile_range': Choice([(5, 95), (10, 90), (15, 85)])})
See Params for more information on how to set Parameters.
scope#
Pipeline objects also have the argument Scope as an input argument.
This argument allows for pipeline objects to work on just a subset of columns.
Notably, if the scope in reference to a specific Dataset
is empty, then
the piopeline object in question will just be silently skipped. This is useful for defining
generic pipelines to different types of datasets.