BPt.Dataset.ordinalize#

Dataset.ordinalize(scope, nan_to_class=False, inplace=False)[source]#

This method is used to ordinalize a group of columns. Ordinalization is performed by setting all n unique categories present in each column to values 0 to n-1.

The LabelEncoder is used on the backend for this operation.

Parameters

scopeScope

A BPt style Scope used to select a subset of column(s) in which to apply the current function to. See Scope for more information on how this can be applied.

nan_to_classbool, optional

If set to True, then treat NaN values as as a unique class, otherwise if False then ordinalization will be applied on just non-NaN values, and any NaN values will remain NaN.

See: nan_to_class for more generally adding NaN values as a new category to any arbitrary categorical column.

default = False

inplacebool, optional

If True, perform the current function inplace and return None.

default = False

Examples

In [1]: data = bp.read_csv('data/example1.csv')

In [2]: data
Out[2]: 
  animals  numbers
0   'cat'      1.0
1   'cat'      2.0
2   'dog'      1.0
3   'dog'      2.0
4   'elk'      NaN

In [3]: data = data.ordinalize('all')

In [4]: data
Out[4]: 
  animals numbers
0       0       0
1       0       1
2       1       0
3       1       1
4       2     NaN

Note that the original names are still saved when using this and simmilar encoding functions.

In [5]: data.encoders
Out[5]: {'animals': {0: "'cat'", 1: "'dog'", 2: "'elk'"}, 'numbers': {0: 1.0, 1: 2.0}}