bank_holidays_countdown
Compute previous and next occurrence of bank holidays in different countries.
Usage
Compute previous and next occurrence of each bank holiday.
Given a date_col, its format date_format, and a country, it will generate two columns per bank holiday:
number of days before next bank holiday occurrence
number of day since last bank holiday occurrence
The bank holidays are the official bank holidays of the given country. They will be anglicised, such that it will be the same feature for Christmas in all countries, for instance.
Actual dates
vs Observed dates
:
In some countries, we get observed bank holidays:
the date when the holiday is observed from a civil standpoint (shops closed, administrations closed), which can be the next day if the holiday falls on a sunday for instance.
In some countries, we might get two events in a year for a given holiday: the event itself and its observed date. In other countries or situations, we might only get one date per year for a holiday, sometimes observed and sometimes not.
This calculator will not have any events qualified as observed
.
In cases where:
we get two events for a given holiday, within a 10 day distance ⇒ we will only keep the original event, and discard the
observed
one.we only have an
observed
event ⇒ we will remove the notion ofobserved
in its label, and keep the date as if it were the actual event.
Exemple:
For instance, in the UK, if we get “Boxing Day” on dec 25 and “Boxing Day (Observed)” on dec 26, we will keep the dec 25 event and discard the dec 26 one.
But in Columbia, where one year we might get an “Epiphany” event, and the next year an “Epiphany (Observed)” event, we will produce only “Epiphany” events.
This means that in countries where bank holidays are observed on other days when they fall on a Sunday, this calculator will not quite capture the closure of services.
This calculator can be used with the following method:
bank_holidays_countdown
Examples:
get specific holidays for every country according to country code.
Main Parameters
The bold options represent the default values when the parameters are optional.
input_columns list of columns used as input of the calculators: only one column containing the date.
output_columns_prefix prefix of the columns added when the output columns cannot be listed: prefix to use for the output columns, as this calculator adds several.
global (true, false) Should this calculator be performed before data splitting during training for cross-validation
steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.
store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.
stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.
stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).
Specific Parameters
country_code: A column name for the column containing the country code, or a country code. The country code must be a two-letter code, as described here: https://github.com/dr-prodigy/python-holidays (ISO 3166-1 alpha-2).
countdown_type [optionnal] List of columns to create, among
in
andago
countdowns. By default, both will be added.date_format [optional]
Format of the date provided, by default, will use %Y-%m-%d
Examples
Here, we want to obtain holidays for two different countries: the US and the United Kingdom. Christmas and New Year are holidays in common for both countries. Thanksgiving is only for the US. It is better to use either
in
orago
for thecountdown_type
, but not both at the same time, as otherwise features will become colinear. If the holiday does not exist for the country, you will obtain NULL.
2021-01-01
US
328
358
0
2021-11-25
US
0
30
37
2021-12-25
US
334
0
7
2022-01-01
US
327
358
0
2022-11-24
US
0
31
38
2022-12-25
UK
null
0
7
2022-12-25
US
333
0
7
2023-01-01
UK
null
358
0
2023-01-01
US
326
358
0
Last updated