BPt.cross_validate#
- BPt.cross_validate(pipeline, dataset, problem_spec='default', cv=5, sk_n_jobs=1, verbose=0, fit_params=None, return_train_score=False, return_estimator=False, error_score=nan, **extra_params)[source]#
This function is a BPt compatible wrapper around
sklearn.model_selection.cross_validate()
- Parameters
- pipeline
Pipeline
- A BPt input class Pipeline to be intialized according to the passed dataset and problem_spec. This parameter can be either an instance of
Pipeline
,ModelPipeline
or one of the below cases.In the case that a single str is passed, it will assumed to be a model indicator str and the pipeline used will be:pipeline = Pipeline(Model(pipeline))
Likewise, if just a Model passed, then the input will be cast as:
pipeline = Pipeline(pipeline)
- dataset
Dataset
- The
Dataset
in which this function should be evaluated in the context of. In other words, the dataset is used as the data source for this operation.Arguments within problem_spec can be used to select just subsets of data. For example parameter scope can be used to select only some columns or parameter subjects to select a subset of subjects. - problem_spec
ProblemSpec
or ‘default’, optional This parameter accepts an instance of the params class
ProblemSpec
. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…See
ProblemSpec
for more information and for how to create an instance of this object.If left as ‘default’, then will initialize a ProblemSpec with default params.
default = "default"
- cv
CV
or sklearn CV, optional This parameter controls what type of cross-validation splitting strategy is used. You may pass a number of options here.
An instance of
CV
representing a custom strategy as defined by the BPt style CV.The custom str ‘test’, which specifies that the whole train set should be used to train the pipeline and the full test set used to validate it (assuming that a train test split has been defined in the underlying dataset)
Any valid scikit-learn style option: Which include an int to specify the number of folds in a (Stratified) KFold, a sklearn CV splitter or an iterable yielding (train, test) splits as arrays of indices.
default = 5
- sk_n_jobsint, optional
The number of jobs as passed to the base sklearn cross_val_score. Typically this value should be kept at 0, and n_jobs as defined through the passed problem_spec used to define the number of jobs.
For added flexibility though, this parameter can be used either with the n_jobs parameter in problem_spec or instead of.
default = 1
- verboseint, optional
The verbosity level as passed to the sklearn function.
default = 0
- fit_paramsdict, optional
Parameters to pass to the fit method of the estimator.
default = None
- return_train_scorebool, optional
Whether to include train scores.
default = False
- return_estimatorbool, optional
Whether to return the estimators fitted on each split.
default = False
- error_score‘raise’ or numeric, optional
Base sklearn func parameter.
default = np.nan
- extra_paramsproblem_spec or pipeline params, optional
You may pass as extra arguments to this function any pipeline or problem_spec argument as python kwargs style value pairs.
For example:
target=1
Would override the value of the target parameter in the passed problem_spec. Or for example:
model=Model('ridge')
- pipeline
See also
cross_val_score
Simplified version of this function.
evaluate
The BPt style similar function with extra options.