BPt.CompareDict.pairwise_t_stats#

CompareDict.pairwise_t_stats(metric='first')[source]#

This method performs pair-wise t-test comparisons between all different options, assuming this object holds instances of EvalResults. The method used to generate t-test comparisons here is based off the example code from: https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_stats.html

Note

In the case that the sizes of the training and validation sets at each fold vary dramatically, it is unclear if this statistics are still valid. In that case, the mean train size and mean validation sizes are employed when computing statistics.

Parameters

metricstr, optional

This method compares the metrics produced for only one valid metric / scorer. Notably all EvalResults must have been evaluated with respect to this scorer. By default the reserved key, ‘first’ indicates that just whatever scorer is first should be used to produce the pairwise t statistics.

default = 'first'

Returns

stats_dfpandas DataFrame: A DataFrame comparing all pairwise combinations of the original Compare options. ‘t_stat’ and ‘p_val’ columns will be generated for each comparison representing the corrected t_stat for non-independence of folds and the corresponding Bonferroni correctted p values (for multiple comparisons from comparing all pairwise combinations). See the referenced scikit-learn example for more information.