BPt.FeatSelector.build#
- FeatSelector.build(dataset='default', problem_spec='default', **problem_spec_params)[source]#
This method is used to convert a single pipeline piece into the base sklearn style object used in the pipeline. This method is mostly used to investigate pieces and is not necessarily designed to produce independently usable pieces.
For now this method will not work when the base obj is a custom object.
- Parameters
- dataset
Dataset
or ‘default’, optional The Dataset in which the pipeline piece should be initialized according to. For example, pipeline’s can include Scopes, these need a reference Dataset.
If left as default will initialize and use an instance of a FakeDataset class, which will work fine for initializing pipeline objects with scope of ‘all’, but should be used with caution when elements of the pipeline use non ‘all’ scopes. In these cases a warning will be issued.
It is advisable to use this build function only for viewing objects. If using the build function instead for eventual modelling it is important to pass the correct
Dataset
in the case that any of the pipeline pieces are at all dependant on the structure of the input data.Note: If problem type is not defined in problem_spec and Dataset is left as default, then a problem type of ‘regression’ will be used.
default = 'default'
- problem_spec
ProblemSpec
or ‘default’, optional This parameter accepts an instance of the params class
ProblemSpec
. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…See
ProblemSpec
for more information and for how to create an instance of this object.If left as ‘default’, then will initialize a ProblemSpec with default params.
default = "default"
- problem_spec_params
ProblemSpec
params, optional You may also pass any valid problem spec argument-value pairs here, in order to override a value in the passed
ProblemSpec
. Overriding params should be passed in kwargs style, for example:func(..., problem_type='binary')
- dataset
- Returns
- estimatorsklearn compatible estimator
Returns the BPt-style sklearn compatible estimator version of this piece as converted to internally when building the pipeline
- paramsdict
Returns a dictionary with any parameter distributions associated with this object, for example this can be used to check what exactly pre-existing parameter distributions point to.
Examples
Given a dataset and pipeline piece (this can be any of the valid Pipeline Pieces not just
Model
as used here).In [1]: import BPt as bp In [2]: dataset = bp.Dataset() In [3]: dataset['col1'] = [1, 2, 3] In [4]: dataset['col2'] = [3, 4, 5] In [5]: dataset.set_role('col2', 'target', inplace=True) In [6]: dataset Out[6]: col1 col2 0 1 3 1 2 4 2 3 5 In [7]: piece = bp.Model('ridge', params=1) In [8]: piece Out[8]: Model(obj='ridge', params=1)
We can call build from the piece
In [9]: estimator, params = piece.build(dataset) In [10]: estimator Out[10]: ('ridge regressor', BPtModel(estimator=Ridge(max_iter=1000, random_state=1, solver='lsqr'), inds='all')) In [11]: params Out[11]: {'ridge regressor__estimator__alpha': Log(lower=0.001, upper=1000000.0)}