BPt.Dataset.to_data_file#
- Dataset.to_data_file(scope, load_func=<function load>, inplace=False)[source]#
This method can be used to cast any existing columns where the values are file paths, to a data file.
- Parameters
- scopeScope
A BPt style Scope used to select a subset of column(s) in which to apply the current function to. See Scope for more information on how this can be applied.
- load_funcpython function, optional
- Fundamentally columns of type ‘data file’ represent a path to a saved file, which means you must also provide some information on how to load the saved file. This parameter is where that loading function should be passed. The passed load_func will be called on each file individually and whatever the output of the function is will be passed to the different loading functions.You might need to pass a user defined custom function in some cases, e.g., you want to use
numpy.load()
, but then alsonumpy.stack()
. Just wrap those two functions in one, and pass the new function.def my_wrapper(x): return np.stack(np.load(x))
Note that in this case where a custom function is defined it is reccomended that you define this function in a separate file from where the main script will be run, and then import the function.By default this function assumes data files are passed as numpy arrays, and uses the default functionnumpy.load()
, when nothing else is specified.default = np.load
- inplacebool, optional
If True, perform the current function inplace and return None.
default = False
See also
add_data_files
Method for adding new data files
consolidate_data_files
Merge existing data files into one column.
Examples
This method can be used as a the primary way to prepare data files. We will perform a simple example here.
In [1]: import BPt as bp In [2]: data = bp.Dataset() In [3]: data['files'] = ['data/loc1.npy', 'data/loc2.npy'] In [4]: data Out[4]: files 0 data/loc1.npy 1 data/loc2.npy
We now have a
Dataset
, but out column ‘files’ is not quite ready, as by default it won’t know what to do with str. To get it to treat it as as a data file we will cast it.In [5]: data = data.to_data_file('files') In [6]: data Out[6]: files 0 0 1 1
What’s happened here? Now it doesn’t show paths anymore, but instead shows integers. That’s actually the desired behavior though, we can check it out in file_mapping.
In [7]: data.file_mapping Out[7]: {0: DataFile(loc='/home/runner/work/BPt/BPt/doc/data/loc1.npy'), 1: DataFile(loc='/home/runner/work/BPt/BPt/doc/data/loc2.npy')}
The file_mapping is then used internally with
Loader
to load objects on the fly.