BPt.Compare#
- class BPt.Compare(options)[source]#
This is a special BPt input class which can be used as a helper to more easily run comparison analysis between a few choice of parameters.
- Parameters
- optionslist of values or
Option
This parameter should be a list of options in which to try each one of. You may pass these options either directly as values, for example
options = ['option1', 'option2']
Or as instances of
Option
. This second strategy is reccomended when the underlying options are objects or something more complex then strings, for example between twoPipeline
pipe1 = bp.Pipeline([bp.Model('elastic')]) pipe2 = bp.Pipeline([bp.Model('ridge')]) options = [bp.Option(pipe1, name='pipe1'), bp.Option(pipe2, name='pipe2')]
- optionslist of values or
Notes
Usage of this object is designed to passed as input toevaluate()
. Only parameters withinevaluate()
parameters pipeline and problem_spec (or their associated extra params) can be passed Compare. That said, some options, while valid may still make downstream intreptation more difficult, e.g., passing problem_type with two different Compare values will work, but will yield results with different metrics.When to use Compare? It may be tempting to use Compare to evaluate different configurations of hyper-parameters, but in most cases this type of fine-grained usage is discouraged. On a conceptual level, the usage of Compare should be used to compare the actual underlying topic of interest! For example, if it of interest to the underlying research topic, then Compare can be used between two differentPipeline
. If instead this is not the key point of interest, but you still wish to try two different, say,Model
, then you would be better off nesting this choice as a hyper-parameter to optimize (in this case see:Select
).Examples
Compare is used with
evaluate()
, for example:In [1]: from BPt.datasets import load_cali In [2]: data = bp.datasets.load_cali() Performing test split on: 20640 subjects. random_state: 3 Test split size: 0.2 Performed train/test split Train size: 16512 Test size: 4128 In [3]: data.shape Out[3]: (20640, 9) In [4]: pipe_options = bp.Compare([bp.Option(bp.Model('elastic'), ...: name='elastic'), ...: bp.Option(bp.Model('ridge'), ...: name='ridge')]) ...: In [5]: bp.evaluate(pipe_options, data, progress_bar=False).summary() Running Compare: Options(pipeline=elastic) Predicting target = MedHouseVal Using problem_type = regression Using scope = all (defining a total of 8 features). Evaluating 16512 total data points. Training Set: (13209, 8) Validation Set: (3303, 8) Fit fold in 0.0 seconds. r2: 0.4303 neg_mean_squared_error: -0.7532 Training Set: (13209, 8) Validation Set: (3303, 8) Fit fold in 0.0 seconds. r2: 0.4201 neg_mean_squared_error: -0.7908 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4187 neg_mean_squared_error: -0.7790 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4148 neg_mean_squared_error: -0.8074 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4207 neg_mean_squared_error: -0.7310 Running Compare: Options(pipeline=ridge) Predicting target = MedHouseVal Using problem_type = regression Using scope = all (defining a total of 8 features). Evaluating 16512 total data points. Training Set: (13209, 8) Validation Set: (3303, 8) Fit fold in 0.0 seconds. r2: 0.4794 neg_mean_squared_error: -0.6883 Training Set: (13209, 8) Validation Set: (3303, 8) Fit fold in 0.0 seconds. r2: -0.0470 neg_mean_squared_error: -1.43 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4688 neg_mean_squared_error: -0.7119 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4705 neg_mean_squared_error: -0.7306 Training Set: (13210, 8) Validation Set: (3302, 8) Fit fold in 0.0 seconds. r2: 0.4627 neg_mean_squared_error: -0.6780 Out[5]: mean_scores_r2 ... std_scores_neg_mean_squared_error pipeline ... elastic 0.420920 ... 0.027155 ridge 0.366886 ... 0.290747 [2 rows x 4 columns]