BPt.Loader#

class BPt.Loader(obj, behav='single', params=0, scope='data file', cache_loc=None, fix_n_wrapper_jobs=False, skip_y_cache=False, **extra_params)[source]#

Loader refers to transformations which operate on Data Files. They in essence take in saved file locations and after some series of specified transformations pass on compatible features.

The Loader object can operate in two ways. Either the Loader can define operations which are computed on single files independently, or load and pass on data to the defined obj as a list, where each element of the list is a subject’s data. See parameter behav.

Parameters
objstr, custom obj or Pipe
obj selects the base loader object to use, this can be either a str corresponding to one of the preset loaders found at Loaders. Beyond pre-defined loaders, users can pass in custom objects as long as they have functions corresponding to the correct behavior.
obj can also be passed as a Pipe. See Pipe’s documentation to learn more on how this works, and why you might want to use it. See Pipeline Objects to read more about pipeline objects in general.
For example, the ‘identity’ loader will load in saved data at the stored file location, lets say they are 2d numpy arrays, and will return a flattened version of the saved arrays, with each data point as a feature. A more practical example might constitute loading in say 3D neuroimaging data, and passing on features as extracted by ROI.
behav‘single’ or ‘all’, optional
The Loader object can operate under two different behaviors, corresponding to operations which can be done for each subject’s Data File independently (‘single’) and operations which must be done using information from all train subject’s Data Files (‘all’).
‘single’ is the default behavior, if requested then the Loader will load each subject’s Data File seperately and apply the passed obj fit_transform. The benefit of this method in contrast to ‘all’ is that only one subject’s full raw data needs to be loaded at once, whereas with all, you must have enough available memory to load all of the current training or validation subject’s raw data at one. Likewise ‘single’ allows for caching fit_transform operations for each individual subject (which can then be more flexibly re-used).
Behavior ‘all’ is designed to work with base objects that accept a list of length n_subjects to their fit_transform function, where each element of the list will be that subject’s loaded Data File. This behavior requires loading all data into memory, but allows for using information from the rest of the group split. For example we would need to set Loader to ‘all’ if we wanted to use nilearn.connectome.ConnectivityMeasure with parameter kind = “tangent” as this transformer requires information from the rest of the loaded subjects when training. On the otherhand, if we used kind = “correlation”, then we could use either behavior ‘all’ or ‘single’ since “correlation” can be computed for each subject individually.
default = 'single'
paramsint, str or dict of params, optional
The parameter params can be used to set an associated distribution of hyper-parameters, fixed parameters or combination of.
Preset parameter options can be found distributions are listed for each choice of params with the corresponding obj at Pipeline Options.
More information on how this parameter works can be found at Params.
default = 0
scopeScope, optional
The scope parameter determines the subset of features / columns in which this object should operate on within the created pipeline. For example, by specifying scope = ‘float’, then this object will only operate on columns with scope float.
See Scope for more information on how scopes can be specified.
default = 'data file'
cache_locstr, Path or None, optional
Optional location in which to if set, the Loader transform function will be cached for each subject when behav is ‘all’. These cached transformations can then be loaded for each subject when they appear again in later folds.

Warning

If behav = ‘all’, then this parameter is currently skipped!

Set to None, to ignore.

default = None
fix_n_wrapper_jobsint or False, optional

Typically this parameter is left as default, but in special cases you may want to set this. It controls the number of jobs fixed for the Loading Wrapper. This parameter can be used to set that value.

default = False
extra_paramsExtra Params
You may pass additional kwargs style arguments for this piece as Extra Params. Any values passed here will be used to try and set that value in the requested obj.
Any parameter value pairs specified here will take priority over any set via params. For example, lets say in the object we are initializing, ‘fake obj’ it has a parameter called size, and we want it fixed as 10, we can specify that with:
(obj='fake obj', ..., size=10)

See Extra Params for more information.

See also

Dataset.add_data_files

For adding data files to Dataset.

Pipe

An input helper class used with this object.

Notes

If obj is passed as Pipe, see Pipe for an example on how different corresponding params can be passed to each piece individually.

Examples

A basic example is shown below:

In [1]: import BPt as bp

In [2]: loader = bp.Loader(obj='identity')

In [3]: loader
Out[3]: Loader(obj='identity')

This specifies that the BPt.extensions.Identity loader be used (which just loads and flattens files).

Methods

build([dataset, problem_spec])

This method is used to convert a single pipeline piece into the base sklearn style object used in the pipeline.

copy()

This method returns a deepcopy of the base object.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.