fill_series

Verteego allows for the filling of the time series and the resampling of the datasets. It resamples every series to a given time interval, filling the missing values in the target columns by 0.

Usage

Fill filling of the time series

This calculator can be used with the following method:

fill_series

Examples:

  • filling the missing values in the target columns by 0


Main Parameters

  • input_columns list of columns used as input of the calculators

  • output_columns list of columns added by the calculators

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • rule: The time interval string. Can be : 'H' for hours 'D' for days 'W' for weeks 'M' for months to use for filling

  • resolution: The resolution at which the series should be filled.

  • max_gap:

    Optional. The maximum number of consecutive missing values to fill. Defaults to no limit.

  • aggregations:

    Optional. The columns that should use a specific function to get aggregated and their aggregation functions. If a column isn't specified in aggregations, it will use the aggregator 'first' when aggregating multiple rows and the missing values will be forward filled. Allowed aggregators: 'mean', 'sum' 'min', 'max'


Examples

  1. The config must specify a date_col for the fill_series to work. The missing dates values will be filled to match the specified time interval (rule).

preprocessing:
  fill_series:
      rule: W
      resolution:
      - item_id
      - pos_id
      max_gap: 4
      aggregations:
        price: mean
        nb_clients: sum

Last updated