BPt.ModelPipeline#

class BPt.ModelPipeline(loaders=None, imputers='default', scalers='default', transformers=None, feat_selectors=None, model='default', param_search=None, cache_loc=None, verbose=False)[source]#

The ModelPipeline class is used to create BPtPipeline’s. The ModelPipeline differs from Pipeline in that it enforces a simplification on the ordering of pieces, representing the typical order in which they might appear. See Pipeline for a more flexible version of this class which does not enforce any ordering.

The order enforced, which follows the order of the input arguments, is:

  1. loaders,

  2. imputers

  3. scalers

  4. transformers

  5. feat_selectors

  6. model

For each parameter, which the exception of model, you may pass either one instance of that piece or a list of that piece. In the case that a list is passed, then it will be treated as a sequential set of steps / transformation where the output from each element of the list if passed on to the next as input.

Parameters
loadersLoader or list of, optional

Each Loader refers to transformations which operate on loaded Data_Files See Loader.

You may wish to consider using the Pipe input class when creating a single Loader obj

Passed loaders can also be wrapped in a Select wrapper, e.g., as either

# Just passing select
loaders = Select([Loader(...), Loader(...)])

# Or nested
loaders = [Loader(...), Select([Loader(...), Loader(...)])]

In this way, most of the pipeline objects can accept lists, or nested lists with param wrapped, not just loaders!

default = None
imputersImputer, list of or None, optional

If there is any missing data (NaN’s) that have been kept within the input data, then an imputation strategy should likely be defined. This param controls what kind of imputation strategy to use.

See Imputer.

You may also pass a value of ‘default’ here, which in the case that any NaN data is present within the training set, the set of two imputers:

'default' == [Imputer('mean', scope='float'),
              Imputer('median', scope='category')]

Will be used. Otherwise, if there is no NaN present in the input data, no Imputer will be used.

default = 'default'
scalersScaler, list of or None, optional

Each Scaler refers to any potential data scaling where a transformation on the data (without access to the target variable) is computed, and the number of features or data points does not change.

See Scaler.

By keeping the default behavior, with parameter, ‘default’, standard scaling is applied, which sets each feature to have a std of 1 and mean 0.

default = 'default'
transformersTransformer, list of or None, optional

Each Transformer defines a type of transformation to the data that changes the number of features in perhaps non-deterministic or not simply removal (i.e., different from feat_selectors), for example applying a PCA, where both the number of features change, but also the new features do not 1:1 correspond to the original features. See Transformer for more information.

default = None
feat_selectorsFeatSelector, list of or None, optional

Each FeatSelector refers to an optional feature selection stage of the Pipeline.

See FeatSelector.

default = None
modelModel, Ensemble, optional

This parameter accepts one input of type Model or Ensemble. Though, while it cannot accept a list (i.e., no sequential behavior), you may still pass Input Type wrapper like Select to perform model selection via param search.

See Model for more information on how to specify a single model to BPt, and Ensemble for information on how to build an ensemble of models.

This parameter cannot be None. In the default case of passing ‘default’, a ridge regression is used.

default =  'default'
param_searchParamSearch, None, optional
This parameter optionally specifies that this object should be nested with a hyper-parameter search.
If passed an instance of ParamSearch, the underlying object, or components of the underlying object (if a pipeline) must have atleast one valid hyper-parameter distribution to search over.
If left as None, the default, then no hyper-parameter search will be performed.
default = None
cache_locPath str or None, optional

Optional parameter specifying a directory in which full BPt pipeline’s should be cached after fitting. This should be either left as None, or passed a str representing a directory in which cached fitted pipeline should be saved.

default = None
verbosebool, optional

If True, then print statements about the current progress of the pipeline during fitting.

Note: If in a multi-processed context, where pipelines are being fit on different threads, verbose output may be messy (i.e., overlapping messages from different threads).

default = 0

Attributes

fixed_piece_order

ModelPipeline has a fixed order in which pieces are constructed, this is:

Methods

build([dataset, problem_spec])

This method generates a sklearn compliant estimator version of the current Pipeline with respect to a passed dataset and Dataset and ProblemSpec.

copy()

This method returns a deepcopy of the base object.

get_params([deep])

Get parameters for this estimator.

print_all([_print])

This method can be used to print a formatted representation of this object.

set_params(**params)

Set the parameters of this estimator.