BPt.Imputer#

class BPt.Imputer(obj, params=0, scope='all', cache_loc=None, base_model=None, base_model_type='default', **extra_params)[source]#

This input object is used to specify imputation steps for a Pipeline.

If there is any missing data (NaN’s), then an imputation strategy is likely necessary (with some expections, i.e., a final model which can accept NaN values directly). This object allows for defining an imputation strategy. In general, you should need at most two Imputers, one for all float type data and one for all categorical data. If there is no missing data, this piece will be skipped.

Parameters

objstr

obj selects the base imputation strategy to use. See Imputers for all available options. Notably, if ‘iterative’ is passed, then a base model must also be passed!

See Pipeline Objects to read more about pipeline objects in general.

paramsint, str or dict of params, optional

The parameter params can be used to set an associated distribution of hyper-parameters, fixed parameters or combination of.

Preset parameter options can be found distributions are listed for each choice of params with the corresponding obj at Pipeline Options.

More information on how this parameter works can be found at Params.

default = 0

scopeScope, optional

The scope parameter determines the subset of features / columns in which this object should operate on within the created pipeline. For example, by specifying scope = ‘float’, then this object will only operate on columns with scope float.

See Scope for more information on how scopes can be specified.

The main options that make sense for imputer are one for float data and one for category datatypes. Though in some cases other choices may make sense.

Note: If using iterative imputation you may want to carefully consider the scope passed. For example, while it may be beneficial to impute categorical and float features seperately, i.e., with different base_model_type’s (categorical for categorical and regression for float), you must also consider that in predicting the missing values under this setup, the categorical imputer would not have access to to the float features and vice versa. In this way, you may want to either just treat all features as float, or instead of imputing categorical features, load missing values as a separate category - and then set the scope here to be ‘all’, such that the iterative imputer has access to all features.

default = 'all'

cache_locstr, Path or None, optional

This parameter can optionally be set to a str or path representing the location in which this object will be cached after fitting. To skip this option, keep as the default argument of None.

If set, the python library joblib is used to cache a copy after fitting and in the case that a cached copy already exists will load from that copy instead of re-fitting the base object.

default = None

base_modelModel, Ensemble or None, optional

If ‘iterative’ is passed to obj, then a base_model is required in order to perform iterative imputation! The base model can be any valid Model or Ensemble

default = None

base_model_type‘default’ or Problem Type, optional

In setting a base imputer model, it may be desirable to have this model have a different ‘problem type’, then your over-arching problem. For example, if performing iterative imputation on categorical features only, you will likely want to use a categorical predictor - but for imputing on float-type features, you will want to use a ‘regression’ type base model.

Choices are ‘binary’, ‘regression’, ‘categorical’ or ‘default’. If ‘default’, then the following behavior will be applied: If all columns within the passed scope of this Imputer object have scope / data type ‘category’, then the problem type for the base model will be set to ‘categorical’. In all other cases, the problem type will be set to ‘regression’.

default = 'default'

extra_paramsExtra Params

You may pass additional kwargs style arguments for this piece as Extra Params. Any values passed here will be used to try and set that value in the requested obj.

Any parameter value pairs specified here will take priority over any set via params. For example, lets say in the object we are initializing, ‘fake obj’ it has a parameter called size, and we want it fixed as 10, we can specify that with:

(obj='fake obj', ..., size=10)

See Extra Params for more information.

Methods

`build`([dataset, problem_spec])	This method is used to convert a single pipeline piece into the base sklearn style object used in the pipeline.
`copy`()	This method returns a deepcopy of the base object.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.