BPt.Ensemble.build#

Ensemble.build(dataset='default', problem_spec='default', **problem_spec_params)[source]#

This method is used to convert a single pipeline piece into the base sklearn style object used in the pipeline. This method is mostly used to investigate pieces and is not necessarily designed to produce independently usable pieces.

For now this method will not work when the base obj is a custom object.

Parameters

datasetDataset or ‘default’, optional

The Dataset in which the pipeline piece should be initialized according to. For example, pipeline’s can include Scopes, these need a reference Dataset.

If left as default will initialize and use an instance of a FakeDataset class, which will work fine for initializing pipeline objects with scope of ‘all’, but should be used with caution when elements of the pipeline use non ‘all’ scopes. In these cases a warning will be issued.

It is advisable to use this build function only for viewing objects. If using the build function instead for eventual modelling it is important to pass the correct Dataset in the case that any of the pipeline pieces are at all dependant on the structure of the input data.

Note: If problem type is not defined in problem_spec and Dataset is left as default, then a problem type of ‘regression’ will be used.

default = 'default'

problem_specProblemSpec or ‘default’, optional

This parameter accepts an instance of the params class ProblemSpec. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…

See ProblemSpec for more information and for how to create an instance of this object.

If left as ‘default’, then will initialize a ProblemSpec with default params.

default = "default"

problem_spec_paramsProblemSpec params, optional

You may also pass any valid problem spec argument-value pairs here, in order to override a value in the passed ProblemSpec. Overriding params should be passed in kwargs style, for example:

func(..., problem_type='binary')

Returns

estimatorsklearn compatible estimator: Returns the BPt-style sklearn compatible estimator version of this piece as converted to internally when building the pipeline
paramsdict: Returns a dictionary with any parameter distributions associated with this object, for example this can be used to check what exactly pre-existing parameter distributions point to.

Examples

Given a dataset and pipeline piece (this can be any of the valid Pipeline Pieces not just Model as used here).

In [1]: import BPt as bp

In [2]: dataset = bp.Dataset()

In [3]: dataset['col1'] = [1, 2, 3]

In [4]: dataset['col2'] = [3, 4, 5]

In [5]: dataset.set_role('col2', 'target', inplace=True)

In [6]: dataset
Out[6]: 
   col1  col2
0     1     3
1     2     4
2     3     5

In [7]: piece = bp.Model('ridge', params=1)

In [8]: piece
Out[8]: Model(obj='ridge', params=1)

We can call build from the piece

In [9]: estimator, params = piece.build(dataset)

In [10]: estimator
Out[10]: 
('ridge regressor',
 BPtModel(estimator=Ridge(max_iter=1000, random_state=1, solver='lsqr'), inds='all'))

In [11]: params
Out[11]: {'ridge regressor__estimator__alpha': Log(lower=0.001, upper=1000000.0)}