
BPt.cross_validate(pipeline, dataset, problem_spec='default', cv=5, sk_n_jobs=1, verbose=0, fit_params=None, return_train_score=False, return_estimator=False, error_score=nan, **extra_params)[source]#

This function is a BPt compatible wrapper around sklearn.model_selection.cross_validate()

A BPt input class Pipeline to be intialized according to the passed dataset and problem_spec. This parameter can be either an instance of Pipeline, ModelPipeline or one of the below cases.
In the case that a single str is passed, it will assumed to be a model indicator str and the pipeline used will be:
pipeline = Pipeline(Model(pipeline))

Likewise, if just a Model passed, then the input will be cast as:

pipeline = Pipeline(pipeline)
The Dataset in which this function should be evaluated in the context of. In other words, the dataset is used as the data source for this operation.
Arguments within problem_spec can be used to select just subsets of data. For example parameter scope can be used to select only some columns or parameter subjects to select a subset of subjects.
problem_specProblemSpec or ‘default’, optional

This parameter accepts an instance of the params class ProblemSpec. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…

See ProblemSpec for more information and for how to create an instance of this object.

If left as ‘default’, then will initialize a ProblemSpec with default params.

default = "default"
cvCV or sklearn CV, optional

This parameter controls what type of cross-validation splitting strategy is used. You may pass a number of options here.

  • An instance of CV representing a custom strategy as defined by the BPt style CV.

  • The custom str ‘test’, which specifies that the whole train set should be used to train the pipeline and the full test set used to validate it (assuming that a train test split has been defined in the underlying dataset)

  • Any valid scikit-learn style option: Which include an int to specify the number of folds in a (Stratified) KFold, a sklearn CV splitter or an iterable yielding (train, test) splits as arrays of indices.

default = 5
sk_n_jobsint, optional

The number of jobs as passed to the base sklearn cross_val_score. Typically this value should be kept at 0, and n_jobs as defined through the passed problem_spec used to define the number of jobs.

For added flexibility though, this parameter can be used either with the n_jobs parameter in problem_spec or instead of.

default = 1
verboseint, optional

The verbosity level as passed to the sklearn function.

default = 0
fit_paramsdict, optional

Parameters to pass to the fit method of the estimator.

default = None
return_train_scorebool, optional

Whether to include train scores.

default = False
return_estimatorbool, optional

Whether to return the estimators fitted on each split.

default = False
error_score‘raise’ or numeric, optional

Base sklearn func parameter.

default = np.nan
extra_paramsproblem_spec or pipeline params, optional

You may pass as extra arguments to this function any pipeline or problem_spec argument as python kwargs style value pairs.

For example:


Would override the value of the target parameter in the passed problem_spec. Or for example:


See also


Simplified version of this function.


The BPt style similar function with extra options.