BPt.get_estimator#

BPt.get_estimator(pipeline, dataset='default', problem_spec='default', **extra_params)[source]#

Get a sklearn compatible estimator from a Pipeline, Dataset and ProblemSpec.

Parameters

pipelinePipeline

A BPt input class Pipeline to be intialized according to the passed dataset and problem_spec. This parameter can be either an instance of Pipeline, ModelPipeline or one of the below cases.

In the case that a single str is passed, it will assumed to be a model indicator str and the pipeline used will be:

pipeline = Pipeline(Model(pipeline))

Likewise, if just a Model passed, then the input will be cast as:

pipeline = Pipeline(pipeline)

datasetDataset or ‘default’, optional

The Dataset in which this function should be evaluated in the context of. In other words, the dataset is used as the data source for this operation.

If left as default will initialize and use an instance of a FakeDataset class, which will work fine for initializing pipeline objects with scope of ‘all’, but should be used with caution when elements of the pipeline use non ‘all’ scopes. In these cases a warning will be issued.

It is typically advised to pass the actual Dataset of interest here.

default = 'default'

problem_specProblemSpec or ‘default’, optional

This parameter accepts an instance of the params class ProblemSpec. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…

See ProblemSpec for more information and for how to create an instance of this object.

If left as ‘default’, then will initialize a ProblemSpec with default params.

default = "default"

extra_paramsproblem_spec or pipeline params, optional

You may pass as extra arguments to this function any pipeline or problem_spec argument as python kwargs style value pairs.

For example:

target=1

Would override the value of the target parameter in the passed problem_spec. Or for example:

model=Model('ridge')

Returns

estimatorsklearn Estimator: The returned object is a sklearn-compatible estimator. It will be either of type BPtPipeline or a BPtPipeline wrapped in a search CV object.

Examples

This example shows how this function can be used with Dataset method get_Xy.

First we will setup a dataset and a pipeline.

In [1]: import BPt as bp

In [2]: data = bp.Dataset([[1, 2, 3], [2, 3, 4], [5, 6, 7]],
   ...:                    columns=['1', '2', '3'],
   ...:                    targets='3')
   ...: 

In [3]: data
Out[3]: 
   1  2  3
0  1  2  3
1  2  3  4
2  5  6  7

In [4]: pipeline = bp.Pipeline([bp.Scaler('standard'), bp.Model('linear')])

In [5]: pipeline
Out[5]: Pipeline(steps=[Scaler(obj='standard'), Model(obj='linear')])

Next we can use get_estimator and also convert the dataset into a traditional sklearn-style X, y input. Note that we are using just the default values for problem spec.

In [6]: X, y = data.get_Xy()

In [7]: X.shape, y.shape
Out[7]: ((3, 2), (3,))

In [8]: estimator = bp.get_estimator(pipeline, data)

Now we can use this estimator as we would any other sklearn style estimator.

In [9]: estimator.fit(X, y)
Out[9]: 
BPtPipeline(steps=[('standard float',
                    ScopeTransformer(estimator=StandardScaler(), inds=Ellipsis)),
                   ('linear regressor',
                    BPtModel(estimator=LinearRegression(n_jobs=1), inds=Ellipsis))])

In [10]: estimator.score(X, y)
Out[10]: 0.9999999999999672