date_weight

Computes the weight of the sample depending on the date.

Usage

This calculator allows you to create a linear weight between the first and the last date. This calculator should only be computed at the training step.

This calculator can be used with the following method:

date_weight

Examples:

  • Give more weight to recent data.


Main Parameters

The bold options represent the default values when the parameters are optional.

  • input_columns list of columns used as input of the calculators: The list of columns that will be used to fill the output column.

  • output_columns list of columns added by the calculators : Name of the filled column added to the dataset.

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • max_sample_proportion The ratio between the weight of the latest sample and the weight of the oldest sample.

    With a max_sample_proportion of 3, the newest sample will have a weight 3 times that of the oldest sample.

    With a max_sample_proportion of 0.5, the oldest sample will have a weight 2 times that of the newest sample.

  • date_format [optionnal] The format to use to convert the date column (in input_columns) to a date object, if not one already.

    Default value: %Y-%m-%d


Examples

  1. The user wants to give more importance to recent data. In fact, while analyzing sales history, he noticed that what he was trying to predict was closer to recent sales data than to older data. He needs to use the date_weight calculator, and then use the output as an algorithm parameter.

    calculated_cols:
      weight:
        steps:
        - training
        method: date_weight
        params:
            max_sample_proportion: 2
        input_columns:
        - date
        output_columns:
        - weight
    
    algo_name: xgboost
    algorithm_parameters:
      sample_weight_col: weight

    In that case, the newest sample will have a weight 2 times that of the oldest sample.

Last updated