BPt.Dataset.get_Xy#

Dataset.get_Xy(problem_spec='default', **problem_spec_params)[source]#

This function is used to get a sklearn-style grouping of input data (X) and target data (y) from the Dataset as according to a passed problem_spec.

Note: X and y are returned as pandas DataFrames not Datasets, so none of the Dataset meta data is accessible through the returned X, y here.

Parameters

problem_specProblemSpec or ‘default’, optional

This argument accepts an instance of the params class ProblemSpec. This object is essentially a wrapper around commonly used parameters needs to define the context the model pipeline should be evaluated in. It includes parameters like problem_type, scorer, n_jobs, random_state, etc…

If left as ‘default’, then will initialize a ProblemSpec with default params.

See ProblemSpec for more information and for how to create an instance of this object.

default = 'default'

problem_spec_paramsProblemSpec params, optional

You may also pass any valid parameter value pairs here, e.g.

X, y = get_Xy(problem_spec=problem_spec, problem_type='binary')

Any parameters passed here will override the original value in problem spec. This can be useful when using all default values for problem spec except for one, e.g., you just want to change random_state.

X, y = get_Xy(problem_spec='default', random_state=5)

Returns

Xpandas DataFrame: DataFrame with the input data and columns as specified by the passed problem_spec. Note: the index will be sorted in identicially between X and y.
ypandas Series: Series with the the target values as requested by the passed problem_spec. Note: the index will be sorted in identicially between X and y.