BPt.EvalResultsSubset.permutation_importance#
- EvalResultsSubset.permutation_importance(dataset=None, n_repeats=10, scorer='default', just_model=True, nested_model=True, return_as='dfs', n_jobs=1, random_state='default')[source]#
This function computes the permutation feature importances from the base scikit-learn function
sklearn.inspection.permutation_importance()
- Parameters
- dataset
Dataset
The instance of
Dataset
originally passed toevaluate()
.Note
If a different dataset is passed, then unexpected behavior may occur.
If left as default=None, then will try to use a shallow copy of the dataset passed to the original evaluate call (assuming evaluate was run with store_data_ref=True).default = None
- n_repeatsint, optional
The number of times to randomly permute each feature.
default = 10
- scorersklearn-style scoring, optional
Scorer to use. It can be a single sklearn style str, or a callable.
If left as ‘default’ will use the first scorer defined when evaluating the underlying estimator.
default = 'default'
- just_modelbool, optional
When set to true, the permutation feature importances will be computed using the final set of transformed features as passed when fitting the base model. This is reccomended behavior because it means that the features do not need to be re-transformed through the full pipeline to evaluate each feature. If set to False, will permute the features in the original feature space (which may be useful in some context).
default = True
- nested_modelbool, optional
In the case that just_model is set to True, there exists in some cases the further option to use an even more transformed set of features. For example, in the case where in the main pipeline the final estimator is another pipeline, there could be more static transformations applied in this second pipeline. If nested_model is set to True, then it will attempt to apply these further nested transformations in the same way as with just_model, feeding in eventually an even further transformed set of features and even more specific final estimator when calculating the permutation feature importances.
By default, this value is True, so the calculated feature importances here will correspond to the saved self.feat_names in this object.
default = True
- return_as[‘dfs’, ‘raw’], optional
This parameter controls if calculated permutation feature importances should be returned as a DataFrame with column names as the corresponding feature names, or if it should be returned as a list with the raw output from each fold, e.g., sklearn Batch’s with parameters ‘importances_mean’, ‘importances_std’ and ‘importances’.
If return as DataFrame is requested, then ‘importances_mean’ and ‘importances_std’ will be returned, but not the raw ‘importances’.
default = 'dfs'
- n_jobsint, optional
The number of jobs to use for this function. Note that if the underlying estimator supports multiple jobs during inference (predicting), and the original problem_spec was set with multiple n_jobs then that original behavior will still hold, and you may wish to keep this parameter as 1. On the otherhand, if the base estimator does not use multiple jobs, passing a higher value here could greatly speed up computation.
default = 1
- random_stateint, ‘default’ or None, optional
Pseudo-random number generator to control the permutations of each feature. If left as ‘default’ then use the random state defined during the initial evaluation of the model. Otherwise, you may pass an int for a different fixed random state or None for explicitly no random state.
default = 'default'
- dataset