Recipes

1. Overview

A Recipe in Verteego defines the steps required to automate complex decision-making processes. It organizes a series of operations that guide the data flow logically from one step to the next, ultimately resulting in well-informed decisions.

Available steps in Recipe are:

  • Importing/Extracting data

  • Forecasting based on the gathered data

  • Conducting optimization based on the gathered data

  • Evaluating the outcomes

  • Exporting the results

These different operations are defined by a YAML file.

2. Data Manipulation

2.1. Import Data

Objective: Fetches data from specified datasets within the Verteego system or from external sources.

YAML Parameters:

  • type: Set to import_from_dataset to indicate the data import action.

Initial Setup:

  • Manually import data from the desired source.

Example:

In case of data import with recipe, conditions must be respected:

  • scheme of already imported data and new updated data are the same

  • data format (column types) are same

2.2. Export Data

Objective: Enables exporting processed data to various types of data sources.

YAML Parameters:

  • type: Set to export to specify the action of data export.

Two method of export are available:

  • Overwriting existing data, default method (method: overwrite)

  • Appending new data to existing data with the same format/data schema (use method: append).

In both cases, the following information must be provided:

  • dataset_name: The destination dataset for export (must be the same).

  • data_source: The target data source.

Taking previsous example:

3.3. Refresh Data

Objective : Allows to update existing data with recent one

YAML: Combination of two steps import_from_dataset and export

Example:

3. Pipeline Manipulation

3.1. Forecast Pipeline

Objective: Executes models to forecast future trends or behaviors using the collected data.

YAML Parameters:

  • type: Set to forecast_pipeline to start the forecasting process.

  • pipeline_name: Specifies the pipeline to be executed.

  • train_set: Defines the training dataset.

  • prediction_set: Defines the dataset for predictions.

Example:

3.2. Optimization

Objective: Identifies the most optimal solutions or configurations using the prepared data.

YAML Parameters:

  • type: Set to optimization_pipeline to initiate the optimization process.

  • pipeline_name: Specifies the pipeline to be executed.

  • optimization_set: The dataset containing the various options or scenarios for optimization.

Example:

3.3. Extracting Pipeline results as Dataset

Purpose: Verteego enables the extraction of data from different stages within a pipeline, allowing it to be stored as datasets for detailed analysis and further processing. This is done using the import_from_pipeline method.

Pipeline Stages for Data Extraction:

  • Preprocessing (forecast pipeline): Data from the initial preparation phase can be exported. This typically includes cleaned and transformed data ready for modeling.

  • Prediction (forecast pipeline): Model predictions can be exported from this stage. Users may choose to include additional analytical insights such as Shapley values, which explain the contribution of each feature to the prediction, or export all features used in the prediction model.

  • Postprocessing (forecast pipeline): Data from the final adjustments made to predictions, such as recalibrated or refined outputs, can be exported after the initial model processing.

  • Optimization (optimization pipeline): The final result of the optimization process can be exported at this stage.

YAML Parameters:

  • type: Specifies the action to be performed. For extracting pipeline data, this should be set to import_from_pipeline.

  • pipeline_name: The name of the pipeline from which the data is to be extracted.

  • pipeline_step: Specifies the stage of the pipeline from which the data will be extracted. Possible values include:

    • preprocessing

    • prediction

    • postprocessing

    • optimization

  • shapley_values (optional): A boolean parameter that determines whether to include Shapley values in the exported dataset. This applies only to the prediction stage and helps explain the influence of each feature on the predictions.

    • true – Include Shapley values.

    • false (default) – Exclude Shapley values.

  • preprocessed_columns (optional): A boolean parameter that specifies whether to include all calculated columns/features in the exported dataset. This is also relevant only for the prediction stage.

    • true – Include calculated columns/features.

    • false (default) – Exclude calculated columns/features.

Example: Extract forecast pipeline result with Shapley values and export it to another data source (Big Query)

3.4. Model Performance and Accuracy Assessment

Purpose: Evaluates the performance and accuracy of predictive or optimization models within the system.

YAML Parameters:

  • type: Set to score to trigger the performance evaluation process.

  • pipeline_name: The name of the pipeline from which data is to be evaluated.

  • reference_dataset: Specifies the dataset used to compute the accuracy or performance metrics.

  • resource_to_evaluate: Indicates the different components or resources within the reference dataset that will be evaluated.

    • name: A label or identifier for the data being evaluated.

    • type: Defines the type of data to be evaluated

Example:

4. Examples

Recipes are configured using YAML (Yet Another Markup Language), which allows for a clear and human-readable definition of each step within the recipe.

4.1. Example 1: Basic sales forecast

4.2. Example 2: Promotion Optimization

This example YAML recipe configures a two-stage process within Verteego. Initially, it forecasts the effectiveness of all possible promotional scenarios for various articles, stores, and promotional periods. Then, it optimizes to select the best promotional mechanism for each unique combination of article, store, and period based on the forecasted results.

5. Best Practices for Recipe Configuration

  • Ensure that each step follows a logical sequence, particularly when the output of one step is required as input for the next.

  • Use clear, descriptive names for each step to improve readability and simplify troubleshooting.

  • Carefully define all parameters for each step to prevent errors during execution.

Last updated