Building the Training and Prediction Set
Specifies the name of the column that the model aims to predict. This is a critical configuration as it defines the target variable for the forecasting algorithm. If this column already exists in the prediction dataset, it will serve as a benchmark to compare against the model’s predictions, enabling the computation of performance scores such as accuracy or mean squared error.
This function identifies the columns to be used by the model for the forecasting task. It selects from among the columns defined in the cols_type
and those generated in the calculated_cols
section. Excluded from selection are the column_to_predict
and any columns derived directly from it, to avoid data leakage and ensure model integrity. This feature selection is crucial, as it focuses the model's learning on relevant predictors while excluding the target variable and its derivatives. To guarantee a robust model, at least one subcategory of features must be included, ensuring the set of features is comprehensive and not empty.
Lists the specific columns from the Prediction input file that should be included during the prediction process. If this list is left empty, the model will consider all available columns in the dataset. Specifying particular columns can streamline the prediction process, ensuring that the model only processes relevant data, which can be particularly useful for reducing complexity and improving performance in datasets with a large number of features.
Last updated