Forecasting Pipelines

If it is your first time trying to write or read a forecasting configuration, we invite you to do the "Getting started" guide available in this section.

Overview of Forecasting Pipelines

A Forecasting Pipeline in Verteego is a structured sequence of steps designed to process data from initial input through to the final forecast output. Typically, a forecasting pipeline consists of 4 to 6 steps, depending on the complexity and requirements of the forecasting task. These steps include:

  • Preprocessing: Data is cleaned and prepared to ensure it is suitable for analysis and modeling.

  • Training: The model is trained using historical data to learn patterns and dynamics.

  • Prediction: Forecasts are generated based on the data processed through the model.

  • Postprocessing: The output from the prediction step is refined and adjusted to improve usability or accuracy.

  • Score on Prediction (Optional): Provides a quantitative evaluation of the prediction’s accuracy if the dataset includes a pre-identified target column.

  • Score on Postprocessing (Optional): Assesses the quality of the postprocessed forecasts, applying similar conditions as the prediction scoring.

Pipeline Configuration and Versioning

Each step in a forecasting pipeline is defined by its configuration, which specifies how the data should be handled and processed. When a pipeline is executed, its configuration is versioned to ensure that each run can be audited and replicated. This configuration is accessible from the pipeline run details under the tab labeled "Configuration."

Context-Specific Functionality

  • Testing Context: In a testing environment, where one simulates future forecasting accuracy to benchmark a pipeline, the prediction dataset may contain the actual outcomes (column_to_predict). Verteego handles this in the pipeline output by automatically appending a “__true” suffix to this column to distinguish it from predicted values.

  • Production Context: In real-world applications, the actual future values are not known at the time of prediction. Thus, scoring based on actual outcomes doesn't occur in this setting, reinforcing the forward-looking nature of the forecasting.

Automatic Column Handling

If the dataset includes a column that has been identified by the configuration as the column_to_predict and it undergoes postprocessing, Verteego automatically preserves a copy of the original predictions with a “__predicted” suffix. This mechanism ensures that any scoring steps are consistently applied to the most refined version of the prediction, without the need for manually specifying which column should be evaluated.

This structure not only maintains the integrity and traceability of the forecasting process but also enhances the usability and application of the forecasted data within real-world scenarios.

Getting Started

Last updated