BPt.Dataset.binarize#

Dataset.binarize(scope, threshold, replace=True, drop=True, inplace=False)[source]#

This method contains a utilities for binarizing a variable. These are dichotomizing an existing variable with parameter threshold, and applying binarization via two thresholds (essentially chopping out the middle of the distribution).

Parameters
scopeScope

A BPt style Scope used to select a subset of column(s) in which to apply the current function to. See Scope for more information on how this can be applied.

thresholdfloat or (float, float)

This parameter can be used to either set a single threshold where any values less than or equal (>=) to the threshold will be set to 0, and any values greater (<) than the threshold will be set to 1.

Alternatively, in the case that a tuple with two values is passed, e.g., (5, 10), then this requests that a lower and upper threshold be set, with any values in the middle either dropped or set to NaN (as dependent on the drop parameter). The first value of the tuple represents the lower threshold, where any values less than this threshold will be set to 0. The second element of the tuple represents the upper threshold where any values greater than this threshold will be set to 1. Note these equalities are strictly less than or greater than, e.g., not less than or equal to.

replacebool, optional

This parameter controls if the original columns should be replaced with their binary version, when set to True, and if set to False will add a new binary column as well as leave the original column. The new columns will share the name of the original columns but with ‘_binary’ appended.

default = True
dropbool, optional

If set to True, then any values between lower and upper will be dropped. If False, they will be set to NaN.

Note: This parameter is only relevant if using the configuration with parameter’s upper and lower.

default = True
inplacebool, optional

If True, perform the current function inplace and return None.

default = False

See also

to_binary

Convert from categorical to binary.

Notes

This function with not work on columns of type Data Files.

Examples

In [1]: data = bp.read_csv('data/example2.csv')

In [2]: data
Out[2]: 
   col1
0   0.1
1   0.2
2   0.3
3   0.4
4   0.5
5   0.6
6   0.7
7   0.8
8   0.9
9   1.0

In [3]: data.binarize(scope='all', threshold=.5)
Out[3]: 
  col1
0    0
1    0
2    0
3    0
4    1
5    1
6    1
7    1
8    1
9    1

In [4]: data.binarize(scope='all', threshold=(.3, .6), drop=False)
Out[4]: 
  col1
0    0
1    0
2  NaN
3  NaN
4  NaN
5  NaN
6    1
7    1
8    1
9    1