BPt.get_estimator#
- BPt.get_estimator(pipeline, dataset='default', problem_spec='default', **extra_params)[source]#
Get a sklearn compatible estimator from a
Pipeline
,Dataset
andProblemSpec
.- Parameters
- pipeline
Pipeline
- A BPt input class Pipeline to be intialized according to the passed dataset and problem_spec. This parameter can be either an instance of
Pipeline
,ModelPipeline
or one of the below cases.In the case that a single str is passed, it will assumed to be a model indicator str and the pipeline used will be:pipeline = Pipeline(Model(pipeline))
Likewise, if just a Model passed, then the input will be cast as:
pipeline = Pipeline(pipeline)
- dataset
Dataset
or ‘default’, optional - The
Dataset
in which this function should be evaluated in the context of. In other words, the dataset is used as the data source for this operation.If left as default will initialize and use an instance of a FakeDataset class, which will work fine for initializing pipeline objects with scope of ‘all’, but should be used with caution when elements of the pipeline use non ‘all’ scopes. In these cases a warning will be issued.It is typically advised to pass the actualDataset
of interest here.default = 'default'
- problem_spec
ProblemSpec
or ‘default’, optional This parameter accepts an instance of the params class
ProblemSpec
. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…See
ProblemSpec
for more information and for how to create an instance of this object.If left as ‘default’, then will initialize a ProblemSpec with default params.
default = "default"
- extra_paramsproblem_spec or pipeline params, optional
You may pass as extra arguments to this function any pipeline or problem_spec argument as python kwargs style value pairs.
For example:
target=1
Would override the value of the target parameter in the passed problem_spec. Or for example:
model=Model('ridge')
- pipeline
- Returns
- estimatorsklearn Estimator
The returned object is a sklearn-compatible estimator. It will be either of type BPtPipeline or a BPtPipeline wrapped in a search CV object.
Examples
This example shows how this function can be used with Dataset method
get_Xy
.First we will setup a dataset and a pipeline.
In [1]: import BPt as bp In [2]: data = bp.Dataset([[1, 2, 3], [2, 3, 4], [5, 6, 7]], ...: columns=['1', '2', '3'], ...: targets='3') ...: In [3]: data Out[3]: 1 2 3 0 1 2 3 1 2 3 4 2 5 6 7 In [4]: pipeline = bp.Pipeline([bp.Scaler('standard'), bp.Model('linear')]) In [5]: pipeline Out[5]: Pipeline(steps=[Scaler(obj='standard'), Model(obj='linear')])
Next we can use get_estimator and also convert the dataset into a traditional sklearn-style X, y input. Note that we are using just the default values for problem spec.
In [6]: X, y = data.get_Xy() In [7]: X.shape, y.shape Out[7]: ((3, 2), (3,)) In [8]: estimator = bp.get_estimator(pipeline, data)
Now we can use this estimator as we would any other sklearn style estimator.
In [9]: estimator.fit(X, y) Out[9]: BPtPipeline(steps=[('standard float', ScopeTransformer(estimator=StandardScaler(), inds=Ellipsis)), ('linear regressor', BPtModel(estimator=LinearRegression(n_jobs=1), inds=Ellipsis))]) In [10]: estimator.score(X, y) Out[10]: 0.9999999999999672