BPt.cross_validate#

BPt.cross_validate(pipeline, dataset, problem_spec='default', cv=5, sk_n_jobs=1, verbose=0, fit_params=None, return_train_score=False, return_estimator=False, error_score=nan, **extra_params)[source]#

This function is a BPt compatible wrapper around sklearn.model_selection.cross_validate()

Parameters

pipelinePipeline

A BPt input class Pipeline to be intialized according to the passed dataset and problem_spec. This parameter can be either an instance of Pipeline, ModelPipeline or one of the below cases.

In the case that a single str is passed, it will assumed to be a model indicator str and the pipeline used will be:

pipeline = Pipeline(Model(pipeline))

Likewise, if just a Model passed, then the input will be cast as:

pipeline = Pipeline(pipeline)

datasetDataset

The Dataset in which this function should be evaluated in the context of. In other words, the dataset is used as the data source for this operation.

Arguments within problem_spec can be used to select just subsets of data. For example parameter scope can be used to select only some columns or parameter subjects to select a subset of subjects.

problem_specProblemSpec or ‘default’, optional

This parameter accepts an instance of the params class ProblemSpec. The ProblemSpec is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…

See ProblemSpec for more information and for how to create an instance of this object.

If left as ‘default’, then will initialize a ProblemSpec with default params.

default = "default"

cvCV or sklearn CV, optional

This parameter controls what type of cross-validation splitting strategy is used. You may pass a number of options here.

An instance of CV representing a custom strategy as defined by the BPt style CV.
The custom str ‘test’, which specifies that the whole train set should be used to train the pipeline and the full test set used to validate it (assuming that a train test split has been defined in the underlying dataset)
Any valid scikit-learn style option: Which include an int to specify the number of folds in a (Stratified) KFold, a sklearn CV splitter or an iterable yielding (train, test) splits as arrays of indices.

default = 5

sk_n_jobsint, optional

The number of jobs as passed to the base sklearn cross_val_score. Typically this value should be kept at 0, and n_jobs as defined through the passed problem_spec used to define the number of jobs.

For added flexibility though, this parameter can be used either with the n_jobs parameter in problem_spec or instead of.

default = 1

verboseint, optional

The verbosity level as passed to the sklearn function.

default = 0

fit_paramsdict, optional

Parameters to pass to the fit method of the estimator.

default = None

return_train_scorebool, optional

Whether to include train scores.

default = False

return_estimatorbool, optional

Whether to return the estimators fitted on each split.

default = False

error_score‘raise’ or numeric, optional

Base sklearn func parameter.

default = np.nan

extra_paramsproblem_spec or pipeline params, optional

You may pass as extra arguments to this function any pipeline or problem_spec argument as python kwargs style value pairs.

For example:

target=1

Would override the value of the target parameter in the passed problem_spec. Or for example:

model=Model('ridge')