Dataset#
Constructor#
|
The BPt Dataset class is the main class used for preparing data
|
Base#
|
This method is the main internal and external facing way of getting the names of columns which match a passed scope from the Dataset. |
|
Method to get a set of subjects, from a set of already loaded ones, or from a saved location. |
|
This method is used to obtain the either normally loaded and stored values from a passed column, or in the case of a data file column, the data file proxy values will be loaded. |
|
This method is designed as helper for adding a new scope val to a number of columns at once, using the existing scope system. |
|
This method is used for removing scopes from an existing column or subset of columns, as selected by the scope parameter. |
|
This method is used to set a role for either a single column or multiple, as set through the scope parameter. |
|
This method is used to set multiple roles across multiple scopes as specified by a passed dictionary with keys as scopes and values as the role to set for all columns corresponding to that scope. |
This function can be used to get a dictionary with the currently loaded roles, See Role for more information on how roles are defined and used within BPt. |
|
|
Calls method according to: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html |
|
Creates and returns a dopy of this dataset, either a deep copy or shallow. |
|
This function will attempt to automatically add scope "category" to any loaded categorical variables. |
|
This function is used to get a sklearn-style grouping of input data (X) and target data (y) from the Dataset as according to a passed problem_spec. |
|
This method is otherwise identical to |
|
This method allows splitting the dataset into sub datasets by the different unique values of a passed scope. |
Encoding#
|
This method works by setting all columns within scope to just two binary categories. |
|
This method contains a utilities for binarizing a variable. |
|
This method is used to apply k binning to a column, or columns. |
|
This method is used to ordinalize a group of columns. |
|
This method will cast any columns that were not categorical that are passed here to categorical. |
|
This method is a used for making a copy of an existing column, ordinalizing it and then setting it to have role = non input. |
|
This function is designed to add a new column
|
Data Files#
|
This method allows adding columns of type 'data file' to the Dataset class. |
|
This method can be used to cast any existing columns where the values are file paths, to a data file. |
|
This function is designed as helper to consolidate all or a subset of the loaded data files into one column. |
|
Go through and update saved file paths within the Datasets file mapping attribute. |
|
This function is used to access the up to date file mapping. |
Filtering & Drop#
|
This method is designed to allow dropping outliers from the requested columns based on comparisons with that columns standard deviation. |
|
This method is designed to allow dropping a fixed percent of outliers from the requested columns. |
This method is designed to allow performing outlier filtering on categorical type variables. |
|
|
This method is designed to allow dropping columns based on some flexible arguments. |
|
This method is used for dropping all of the subjects which have NaN values for a given scope / column. |
|
This method is used for dropping subjects based on the amount of missing values found across a subset of columns as selected by scope. |
|
This method will drop any columns with less than or equal to the number of unique values. |
|
This method is used for dropping columns based on the amount of missing values per column, dropping any which exceed a user defined threshold. |
|
This method will drop any str-type / object type columns where the number of unique columns is equal to the length of the dataframe. |
|
This method is used for checking to see if there are any columns loaded with duplicate values. |
|
This method will drop all subjects that do not overlap with the passed subjects to this function. |
|
This method will drop all subjects that overlap with the passed subjects to this function. |
Plotting / Viewing#
|
This function creates plots for each of the passed columns (as specified by scope) seperately. |
|
This function creates a multi-figure plot containing all of the passed columns (as specified by scope) in their own axes. |
|
This method can be used to plot the relationship between two variables. |
|
|
|
This method is used to generate a summary across some data. |
Display an HTML representation of the Dataset, as split by scope, instead of the default repr html as split by role. |
Train / Test Split#
|
Defines a set of subjects to be reserved as test subjects. This
|
|
Defines a set of subjects to be reserved as train subjects. This
|
|
This method defines and returns a Train and Test Dataset
|
|
This method defines and returns a Train and Test Dataset
|
Saves the currently defined test subjects in a text file with one subject / index per line. |
|
Saves the currently defined train subjects in a text file with one subject / index per line. |