BPt.Dataset.ordinalize#
- Dataset.ordinalize(scope, nan_to_class=False, inplace=False)[source]#
This method is used to ordinalize a group of columns. Ordinalization is performed by setting all n unique categories present in each column to values 0 to n-1.
The
LabelEncoder
is used on the backend for this operation.- Parameters
- scopeScope
A BPt style Scope used to select a subset of column(s) in which to apply the current function to. See Scope for more information on how this can be applied.
- nan_to_classbool, optional
- If set to True, then treat NaN values as as a unique class, otherwise if False then ordinalization will be applied on just non-NaN values, and any NaN values will remain NaN.See:
nan_to_class
for more generally adding NaN values as a new category to any arbitrary categorical column.default = False
- inplacebool, optional
If True, perform the current function inplace and return None.
default = False
Examples
In [1]: data = bp.read_csv('data/example1.csv') In [2]: data Out[2]: animals numbers 0 'cat' 1.0 1 'cat' 2.0 2 'dog' 1.0 3 'dog' 2.0 4 'elk' NaN In [3]: data = data.ordinalize('all') In [4]: data Out[4]: animals numbers 0 0 0 1 0 1 2 1 0 3 1 1 4 2 NaN
Note that the original names are still saved when using this and simmilar encoding functions.
In [5]: data.encoders Out[5]: {'animals': {0: "'cat'", 1: "'dog'", 2: "'elk'"}, 'numbers': {0: 1.0, 1: 2.0}}