BPt.Compare#

class BPt.Compare(options)[source]#

This is a special BPt input class which can be used as a helper to more easily run comparison analysis between a few choice of parameters.

Parameters

optionslist of values or Option

This parameter should be a list of options in which to try each one of. You may pass these options either directly as values, for example

options = ['option1', 'option2']

Or as instances of Option. This second strategy is reccomended when the underlying options are objects or something more complex then strings, for example between two Pipeline

pipe1 = bp.Pipeline([bp.Model('elastic')])
pipe2 = bp.Pipeline([bp.Model('ridge')])

options = [bp.Option(pipe1, name='pipe1'),
           bp.Option(pipe2, name='pipe2')]

Notes

Usage of this object is designed to passed as input to evaluate(). Only parameters within evaluate() parameters pipeline and problem_spec (or their associated extra params) can be passed Compare. That said, some options, while valid may still make downstream intreptation more difficult, e.g., passing problem_type with two different Compare values will work, but will yield results with different metrics.

When to use Compare? It may be tempting to use Compare to evaluate different configurations of hyper-parameters, but in most cases this type of fine-grained usage is discouraged. On a conceptual level, the usage of Compare should be used to compare the actual underlying topic of interest! For example, if it of interest to the underlying research topic, then Compare can be used between two different Pipeline. If instead this is not the key point of interest, but you still wish to try two different, say, Model, then you would be better off nesting this choice as a hyper-parameter to optimize (in this case see: Select).

Examples

Compare is used with evaluate(), for example:

In [1]: from BPt.datasets import load_cali

In [2]: data = bp.datasets.load_cali()
Performing test split on: 20640 subjects.
random_state: 3
Test split size: 0.2

Performed train/test split
Train size: 16512
Test size:  4128

In [3]: data.shape
Out[3]: (20640, 9)

In [4]: pipe_options = bp.Compare([bp.Option(bp.Model('elastic'),
   ...:                                      name='elastic'),
   ...:                            bp.Option(bp.Model('ridge'),
   ...:                                      name='ridge')])
   ...: 

In [5]: bp.evaluate(pipe_options, data, progress_bar=False).summary()
Running Compare: Options(pipeline=elastic)
Predicting target = MedHouseVal
Using problem_type = regression
Using scope = all (defining a total of 8 features).
Evaluating 16512 total data points.

Training Set: (13209, 8)
Validation Set: (3303, 8)
Fit fold in 0.0 seconds.
r2: 0.4303
neg_mean_squared_error: -0.7532

Training Set: (13209, 8)
Validation Set: (3303, 8)
Fit fold in 0.0 seconds.
r2: 0.4201
neg_mean_squared_error: -0.7908

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4187
neg_mean_squared_error: -0.7790

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4148
neg_mean_squared_error: -0.8074

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4207
neg_mean_squared_error: -0.7310

Running Compare: Options(pipeline=ridge)
Predicting target = MedHouseVal
Using problem_type = regression
Using scope = all (defining a total of 8 features).
Evaluating 16512 total data points.

Training Set: (13209, 8)
Validation Set: (3303, 8)
Fit fold in 0.0 seconds.
r2: 0.4794
neg_mean_squared_error: -0.6883

Training Set: (13209, 8)
Validation Set: (3303, 8)
Fit fold in 0.0 seconds.
r2: -0.0470
neg_mean_squared_error: -1.43

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4688
neg_mean_squared_error: -0.7119

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4705
neg_mean_squared_error: -0.7306

Training Set: (13210, 8)
Validation Set: (3302, 8)
Fit fold in 0.0 seconds.
r2: 0.4627
neg_mean_squared_error: -0.6780

Out[5]: 
          mean_scores_r2  ...  std_scores_neg_mean_squared_error
pipeline                  ...                                   
elastic         0.420920  ...                           0.027155
ridge           0.366886  ...                           0.290747

[2 rows x 4 columns]