BPt.Dataset.auto_detect_categorical#

Dataset.auto_detect_categorical(scope='all', obj_thresh=30, all_thresh=None, inplace=False)[source]#

This function will attempt to automatically add scope “category” to any loaded categorical variables. Note that any columns with pandas data type category should already be detected without calling this function.

Default heuristic threshold settings are used by default, by they can be changed.

Note: if any of the conditions are met the column will be set to categorical, it is not the case that if a single condition is not met, then it won’t be categorical.

Any column with only two unique non-nan values is considered binary- and therefore categorical. This is a fixed behavior of this function.

If any data file’s within scope, will always treat as not categorical, and further will make sure it is not categorical if for some reason set as that.

Parameters
scopeScope, optional

Any valid BPt style scope used to select which columns this function should operate on. E.g., if known that only a subset of columns might be categorical, you could specify only this subset to work on.

By default this is set to ‘all’, which will check all columns.

default = 'all'
obj_threshint or None, optional

This threshold is used for any columns of pandas datatype object. If the number of unique non-nan values in this object datatype column is less than this threshold, this column will be set to categorical.

To ignore this condition, you may pass None.

default = 30
all_threshint or None, optional

Simmilar to obj_thresh, except that this condition is for any column regardless of datatype, this threshold is set such that if the number of unique non-nan values in this column is less than the passed value, this column will be set to categorical.

To ignore this condition, you may pass None.

default = None
inplacebool, optional

If True, do operation inplace and return None.

default = False