BPt.Loader#
- class BPt.Loader(obj, behav='single', params=0, scope='data file', cache_loc=None, fix_n_wrapper_jobs=False, skip_y_cache=False, **extra_params)[source]#
Loader refers to transformations which operate on Data Files. They in essence take in saved file locations and after some series of specified transformations pass on compatible features.
The Loader object can operate in two ways. Either the Loader can define operations which are computed on single files independently, or load and pass on data to the defined obj as a list, where each element of the list is a subject’s data. See parameter behav.
- Parameters
- objstr, custom obj or
Pipe
- obj selects the base loader object to use, this can be either a str corresponding to one of the preset loaders found at Loaders. Beyond pre-defined loaders, users can pass in custom objects as long as they have functions corresponding to the correct behavior.obj can also be passed as a
Pipe
. SeePipe
’s documentation to learn more on how this works, and why you might want to use it. See Pipeline Objects to read more about pipeline objects in general.For example, the ‘identity’ loader will load in saved data at the stored file location, lets say they are 2d numpy arrays, and will return a flattened version of the saved arrays, with each data point as a feature. A more practical example might constitute loading in say 3D neuroimaging data, and passing on features as extracted by ROI. - behav‘single’ or ‘all’, optional
- The Loader object can operate under two different behaviors, corresponding to operations which can be done for each subject’s Data File independently (‘single’) and operations which must be done using information from all train subject’s Data Files (‘all’).‘single’ is the default behavior, if requested then the Loader will load each subject’s Data File seperately and apply the passed obj fit_transform. The benefit of this method in contrast to ‘all’ is that only one subject’s full raw data needs to be loaded at once, whereas with all, you must have enough available memory to load all of the current training or validation subject’s raw data at one. Likewise ‘single’ allows for caching fit_transform operations for each individual subject (which can then be more flexibly re-used).Behavior ‘all’ is designed to work with base objects that accept a list of length n_subjects to their fit_transform function, where each element of the list will be that subject’s loaded Data File. This behavior requires loading all data into memory, but allows for using information from the rest of the group split. For example we would need to set Loader to ‘all’ if we wanted to use
nilearn.connectome.ConnectivityMeasure
with parameter kind = “tangent” as this transformer requires information from the rest of the loaded subjects when training. On the otherhand, if we used kind = “correlation”, then we could use either behavior ‘all’ or ‘single’ since “correlation” can be computed for each subject individually.default = 'single'
- paramsint, str or dict of params, optional
- The parameter params can be used to set an associated distribution of hyper-parameters, fixed parameters or combination of.Preset parameter options can be found distributions are listed for each choice of params with the corresponding obj at Pipeline Options.More information on how this parameter works can be found at Params.
default = 0
- scopeScope, optional
- The scope parameter determines the subset of features / columns in which this object should operate on within the created pipeline. For example, by specifying scope = ‘float’, then this object will only operate on columns with scope float.See Scope for more information on how scopes can be specified.
default = 'data file'
- cache_locstr, Path or None, optional
- Optional location in which to if set, the Loader transform function will be cached for each subject when behav is ‘all’. These cached transformations can then be loaded for each subject when they appear again in later folds.
Warning
If behav = ‘all’, then this parameter is currently skipped!
Set to None, to ignore.
default = None
- fix_n_wrapper_jobsint or False, optional
Typically this parameter is left as default, but in special cases you may want to set this. It controls the number of jobs fixed for the Loading Wrapper. This parameter can be used to set that value.
default = False
- extra_paramsExtra Params
- You may pass additional kwargs style arguments for this piece as Extra Params. Any values passed here will be used to try and set that value in the requested obj.Any parameter value pairs specified here will take priority over any set via params. For example, lets say in the object we are initializing, ‘fake obj’ it has a parameter called size, and we want it fixed as 10, we can specify that with:
(obj='fake obj', ..., size=10)
See Extra Params for more information.
- objstr, custom obj or
See also
Dataset.add_data_files
For adding data files to
Dataset
.Pipe
An input helper class used with this object.
Notes
If obj is passed as
Pipe
, seePipe
for an example on how different corresponding params can be passed to each piece individually.Examples
A basic example is shown below:
In [1]: import BPt as bp In [2]: loader = bp.Loader(obj='identity') In [3]: loader Out[3]: Loader(obj='identity')
This specifies that the
BPt.extensions.Identity
loader be used (which just loads and flattens files).Methods
build
([dataset, problem_spec])This method is used to convert a single pipeline piece into the base sklearn style object used in the pipeline.
copy
()This method returns a deepcopy of the base object.
get_params
([deep])Get parameters for this estimator.
set_params
(**params)Set the parameters of this estimator.