Role#
There are three possible roles in the Dataset
class / BPt. These are
‘data’, ‘target’ and ‘non input’. By default, all loaded columns will be treated with
role ‘input data’ until set differently. Roles are set through
methods set_role
and set_roles
. Or through dedicated
helper methods set_target
and
set_non_input
.
The different roles are described below.
- input data
The default role, input data, is used to indicate all columns which might at some point serve as input features for an eventual predictive task. Data can have any of the Data Types including Data Files. NaN’s are allowed in data columns. The basic idea is that when passing a
Dataset
toevaluate()
, by default every column with role=’input data’ will be used as input to predict, unless this set of variables is limited in some way by passing a Scope. Note, you may also access this role with shorted keyword ‘data’.
- target
The role of target is used to indicate columns which are to be predicted, and therefore will not serve as input features for any predictive tasks. Targets can take on any of the Data Types except Data Files. Target columns can include NaN values, although be warned that trying passing a target with NaN values to some functions may not work correctly. Likewise, during prediction, any targets with NaN values will not ever be used in any training sets, or contribute to any generated metrics on validation sets. In the case of multiple columns with role=’target’, then a specific target variable may be set when using
evaluate()
.
- non input
As the name suggests, any features set with role non input, will not be provided directly as input features to a predictive task. Instead, these features are usually categorical and can be used to inform cross-validation behavior or to examine predictive results under different groupings. For example see
copy_as_non_input
to make an ordinalized copy of an existing column. While there is no strict requirement that columns with role non input be categorical, there is a fixed requirement that any columns with role non input cannot contain any NaN’s.