BPt.Dataset.drop_cols_by_nan#
- Dataset.drop_cols_by_nan(scope='all', threshold=0.5, inplace=False)[source]#
This method is used for dropping columns based on the amount of missing values per column, dropping any which exceed a user defined threshold.
- Parameters
- scopeScope
default = 'all'
- thresholdfloat or int, optional
- Passed as a float between 0 and 1, or as an int. If greater than 0 or less than 1, this parameter represents the threshold in which a column should be dropped if it has greater than or equal to this percent of its columns as NaN values.If passed a value greater than 1, then this threshold represents the absolute value in which if a column has that number of subjects or greater with NaN, it will be dropped.For example, if a column within a dataset containing 10 total rows has 3 non-missing values and 7 missing values, a threshold of .7 or lower will drop the column. On the other hand, anything above .7, will not.
default = .5
- inplacebool, optional
If True, perform the current function inplace and return None.
default = False
Examples
Consider a brief example below where we first load in a simple Dataset and then apply the drop_cols_by_nan method.
In [1]: data = bp.read_csv('data/example1.csv') In [2]: data Out[2]: animals numbers 0 'cat' 1.0 1 'cat' 2.0 2 'dog' 1.0 3 'dog' 2.0 4 'elk' NaN In [3]: data.drop_cols_by_nan(threshold=.1) Setting NaN threshold to: 0.5 Dropped 1 Columns Out[3]: animals 0 'cat' 1 'cat' 2 'dog' 3 'dog' 4 'elk'
Alternatively, note that if we pass a threshold above .2, then no columns will be dropped.
In [4]: data.drop_cols_by_nan(threshold=.5) Setting NaN threshold to: 2.5 Out[4]: animals numbers 0 'cat' 1.0 1 'cat' 2.0 2 'dog' 1.0 3 'dog' 2.0 4 'elk' NaN