interval_index

Assigns values to intervals.

Usage

Assign interval name for each intervals defined in config. Assign NaN when the value is not in any interval. Only work with integer for now.

This calculator can be used with the following method:

interval_index

Examples:

  • Create size category based on the product dimension

  • Create price category based on the product price


Main Parameters

The bold options represent the default values when the parameters are optional.

  • input_columns list of columns used as input of the calculators: The list of columns that will be used to fill the output column.

  • output_columns list of columns added by the calculators : Name of the filled column added to the dataset.

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • value_col The column used to identify interval.

  • intervals Intervals, with their upper and lower bounds. Intervals are named in the resulting column using this parameter.


Examples

  1. Given a dataset with sales data with the price item information (item_price ), the user want classify the price into multiple category. The interval need to take into account all the different possibilities to avoid null value.

    calculated_cols:
      intervals:
        method: interval_index
        input_columns:
        - item_price
        output_columns:
        - interval_column
        params:
          value_col: item_price
          intervals:
            cheap:
            - 0
            - 5
            medium:
            - 6
            - 50
          	upper:
          	- 50
          	- 200
          	expensive:
          	- 200
          	- 1000

    Input:

    item_price

    1

    2

    7

    60

    600

    1200

    60.1

    1.2

    Output

    item_price
    interval_column

    1

    cheap

    2

    cheap

    7

    medium

    60

    upper

    600

    expensive

    1200

    60.1

    1.2

Last updated