BPt.Dataset.consolidate_data_files#

Dataset.consolidate_data_files(save_dr, replace_with=None, scope='data file', cast_to=None, clear_existing='fail', n_jobs=- 1)[source]#

This function is designed as helper to consolidate all or a subset of the loaded data files into one column. While this removes information, in can provide a speed up in terms of downstream loading and reduce the number of files cached when using Loader.

This method assumes that the underlying data files can be stacked with

np.stack(data, axis=-1)

After they have been loaded. If this is not the case, then this function will break.

Parameters

save_drstr or Path

The file directory in which to save the consolidated files. If it doesn’t exist, then it will be created.

replace_withstr or None, optional

By default, if replace_with is left as None, then just a saved version of the files will be made. Instead, if a column name passed as a str is passed, then the original data files which were consolidated will be deleted, and the new consolidated column loaded instead.

default = None

scopeScope

A BPt style Scope used to select a subset of column(s) in which to apply the current function to. See Scope for more information on how this can be applied.

default = 'data file'

cast_toNone or numpy dtype, optional

If not None, then this should be a numpy dtype in which the stacked data will be cast to before saving.

default = None

clear_existingbool or ‘fail’, optional

If True, then if the save dr already has files in it, delete them. If False, just overwrite them.

If ‘fail’ then if there are already files in the save directory, raise an error.

default = 'fail'

n_jobsint, optional

The number of jobs to use while stacking and saving each file.

If -1, then will try to use all available cpu’s.

default == -1