Role#

There are three possible roles in the Dataset class / BPt. These are ‘data’, ‘target’ and ‘non input’. By default, all loaded columns will be treated with role ‘input data’ until set differently. Roles are set through methods set_role and set_roles. Or through dedicated helper methods set_target and set_non_input.

The different roles are described below.

  • input data

    The default role, input data, is used to indicate all columns which might at some point serve as input features for an eventual predictive task. Data can have any of the Data Types including Data Files. NaN’s are allowed in data columns. The basic idea is that when passing a Dataset to evaluate(), by default every column with role=’input data’ will be used as input to predict, unless this set of variables is limited in some way by passing a Scope. Note, you may also access this role with shorted keyword ‘data’.

  • target

    The role of target is used to indicate columns which are to be predicted, and therefore will not serve as input features for any predictive tasks. Targets can take on any of the Data Types except Data Files. Target columns can include NaN values, although be warned that trying passing a target with NaN values to some functions may not work correctly. Likewise, during prediction, any targets with NaN values will not ever be used in any training sets, or contribute to any generated metrics on validation sets. In the case of multiple columns with role=’target’, then a specific target variable may be set when using evaluate().

  • non input

    As the name suggests, any features set with role non input, will not be provided directly as input features to a predictive task. Instead, these features are usually categorical and can be used to inform cross-validation behavior or to examine predictive results under different groupings. For example see copy_as_non_input to make an ordinalized copy of an existing column. While there is no strict requirement that columns with role non input be categorical, there is a fixed requirement that any columns with role non input cannot contain any NaN’s.