BPt.Dataset.consolidate_data_files#
- Dataset.consolidate_data_files(save_dr, replace_with=None, scope='data file', cast_to=None, clear_existing='fail', n_jobs=- 1)[source]#
This function is designed as helper to consolidate all or a subset of the loaded data files into one column. While this removes information, in can provide a speed up in terms of downstream loading and reduce the number of files cached when using
Loader
.This method assumes that the underlying data files can be stacked with
np.stack(data, axis=-1)
After they have been loaded. If this is not the case, then this function will break.
- Parameters
- save_drstr or Path
The file directory in which to save the consolidated files. If it doesn’t exist, then it will be created.
- replace_withstr or None, optional
By default, if replace_with is left as None, then just a saved version of the files will be made. Instead, if a column name passed as a str is passed, then the original data files which were consolidated will be deleted, and the new consolidated column loaded instead.
default = None
- scopeScope
default = 'data file'
- cast_toNone or numpy dtype, optional
If not None, then this should be a numpy dtype in which the stacked data will be cast to before saving.
default = None
- clear_existingbool or ‘fail’, optional
If True, then if the save dr already has files in it, delete them. If False, just overwrite them.
If ‘fail’ then if there are already files in the save directory, raise an error.
default = 'fail'
- n_jobsint, optional
The number of jobs to use while stacking and saving each file.
If -1, then will try to use all available cpu’s.
default == -1
See also
to_data_file
Convert existing column to data file.
add_data_files
Method for adding new data files
update_data_file_paths
Update data path saved paths