day_count

Counts specific days.

Usage

This calculator allow the user to count specific days (monday = 0, … sunday = 6) between 2 dates.

This calculator can be used with the following method:

day_count

Examples:

  • Find the number of sundays between 2 dates.

  • Find the number of weekend days between 2 dates.


Main Parameters

The bold options represent the default values when the parameters are optional.

  • input_columns list of columns used as input of the calculators: The list of columns that will be used to fill the output column.

  • output_columns list of columns added by the calculators : Name of the filled column added to the dataset.

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • begin_date The name of the column containing the lower bound for days counting.

  • end_date The name of the column containing the upper bound for days counting.

  • day_idx The list of days to count, using their position in week. 0: Monday, …, 6: Sunday.


Examples

  1. Using a promo sales dataset with the begin date of the promo (begin_date_promo) and the end date of the promo (end_date_promo), the store is closed on Sunday. The user want to find the number of Sunday within the promo time frame.

    calculated_cols:
      count_sunday:
        method: day_count
        input_columns:
        - begin_date_promo
        - end_date_promo
        output_columns:
        - count_sunday
        params:
          begin_col: begin_date_promo
          end_col: end_date_promo
          day_idx:
          - 6

    Input :

    begin_date_promo
    end_date_promo

    2024-01-01

    2024-01-14

    Output :

    begin_date_promo
    end_date_promo
    count_sunday

    2024-01-01

    2024-01-14

    2

  2. With the same use case but now the store is closed on Monday and Sunday. So now the user want to find the number of Monday and Sunday within the different promo time frame.

    calculated_cols:
      count_sunday:
        method: day_count
        input_columns:
        - begin_date_promo
        - end_date_promo
        output_columns:
        - count_offday
        params:
          begin_col: begin_date_promo
          end_col: end_date_promo
          day_idx:
          - 0
          - 6

    Input :

    begin_date_promo
    end_date_promo

    2024-01-01

    2024-01-14

    Output :

    begin_date_promo
    end_date_promo
    count_offday

    2024-01-01

    2024-01-14

    4

Last updated