BPt.Dataset.add_unique_overlap#

Dataset.add_unique_overlap(cols, new_col, decode_values=True, inplace=False)[source]#
This function is designed to add a new column with the overlapped unique values from passed two or more columns.
The new added column will be default be added with role data, except if all of the passed cols have a different role. In the case that all of the passed cols have the same role, the new col will share that role. Simmilar to role, the scope of the new column will be the overlap of shared scopes from all of the passed new_col. If no overlap, then no scope.
Parameters
colslist of str

The names of the columns to compute the overlap with. E.g., in the example above, cols = [‘A’, ‘B’].

Note: You must pass atleast two columns here.

new_colstr

The name of the new column where these values will be stored.

decode_valuesbool, optional

This is an optional parameter, which is set to True will when creating the overlapping values will try to replace values with the encoded value (if any). For example if a variable being added had an originally encoded values of ‘cat’ and ‘dog’, then the replace value before ordinalization would be col_name=cat and col_name=dog, vs. if set to False would have values of col_name=0 and col_name=1.

default = True
inplacebool, optional

If True, perform the current function inplace and return None.

default = False

Examples

In [1]: data = bp.read_csv('data/example1.csv')

In [2]: data
Out[2]: 
  animals  numbers
0   'cat'      1.0
1   'cat'      2.0
2   'dog'      1.0
3   'dog'      2.0
4   'elk'      NaN

In [3]: data.add_unique_overlap(cols=['animals', 'numbers'],
   ...:                         new_col='combo', inplace=True)
   ...: 

In [4]: data
Out[4]: 
  animals  numbers combo
0   'cat'      1.0     0
1   'cat'      2.0     1
2   'dog'      1.0     2
3   'dog'      2.0     3
4   'elk'      NaN     4

In [5]: data.encoders['combo']
Out[5]: 
{0: "animals='cat' numbers=1.0 ",
 1: "animals='cat' numbers=2.0 ",
 2: "animals='dog' numbers=1.0 ",
 3: "animals='dog' numbers=2.0 ",
 4: "animals='elk' numbers=nan "}

In that example every combination was a unique combination. Let’s try again, but now with overlaps.

In [6]: data = bp.read_csv('data/example1.csv')

In [7]: data = data.ordinalize('all')

In [8]: data.add_unique_overlap(cols=['animals', 'numbers'],
   ...:                         new_col='combo', inplace=True)
   ...: 

In [9]: data
Out[9]: 
  animals numbers combo
0       0       0     0
1       0       1     1
2       1       0     2
3       1       1     3
4       2     NaN     4

In [10]: data.encoders['combo']
Out[10]: 
{0: "animals='cat' numbers=1.0 ",
 1: "animals='cat' numbers=2.0 ",
 2: "animals='dog' numbers=1.0 ",
 3: "animals='dog' numbers=2.0 ",
 4: "animals='elk' numbers=nan "}