preprocessing

Description

Details specific operations that are applied during the preprocessing stage. This may include tasks such as data cleaning, outlier removal, feature scaling, or transformation techniques essential for preparing the data for effective model training and prediction.

A Preprocessing takes a Dataset as input and performs several operations in the following order:

  • load Dataset, filter and cast columns to load using cols_type configuration entry

  • if date_col is set and filtering by date configured, filter using begin and end date values

  • if date_col is set, sort all rows by date

  • add global calculated features

  • check number of train and test datasets to build depending on whether hyperparameter tuning is activated and CV configured

  • build a train and test dataset for each CV set and each model resolution

  • apply if set the following operations on each dataset: extrapolate, aggregate, filter, and drop

  • add local calculated features inside each dataset

Steps impacted

preprocessing

Example

# Define columns' type. Can be forced.
cols_type:
  item_id: np.float32
  pos_id: str
  qty: np.float32
  receipt_date: str

date_col: receipt_date

# Model per pos_id (optional) 
preprocessing:
  model_resolution:
  - pos_id

Last updated