Recipes

1. Overview

A Recipe in Verteego defines the steps required to automate complex decision-making processes. It organizes a series of operations that guide the data flow logically from one step to the next, ultimately resulting in well-informed decisions.

Available steps in Recipe are:

  • Importing/Extracting data

  • Forecasting based on the gathered data

  • Conducting optimization based on the gathered data

  • Evaluating the outcomes

  • Exporting the results

These different operations are defined by a YAML file.

2. Data Manipulation

2.1. Import Data

Objective: Fetches data from specified datasets within the Verteego system or from external sources.

YAML Parameters:

  • type: Set to import_from_dataset to indicate the data import action.

Initial Setup:

  • Manually import data from the desired source.

Example:

  refresh_item_referential_dataset:
  type: import_from_dataset
  params:
    dataset_name: item_referential

In case of data import with recipe, conditions must be respected:

  • scheme of already imported data and new updated data are the same

  • data format (column types) are same

2.2. Export Data

Objective: Enables exporting processed data to various types of data sources.

YAML Parameters:

  • type: Set to export to specify the action of data export.

Two method of export are available:

  • Overwriting existing data, default method (method: overwrite)

  • Appending new data to existing data with the same format/data schema (use method: append).

In both cases, the following information must be provided:

  • dataset_name: The destination dataset for export (must be the same).

  • data_source: The target data source.

Taking previsous example:

export_item_referential:
  type: export
  params:
    dataset_name: item_referential
    data_source: 0_item
    # In case of overwriting method section is not required
    method: append

3.3. Refresh Data

Objective : Allows to update existing data with recent one

YAML: Combination of two steps import_from_dataset and export

Example:

refresh_item_referential_dataset:
  type: import_from_dataset
  params:
    dataset_name: item_referential
export_item_referential:
  type: export
  params:
    dataset_name: item_referential
    data_source: 0_item
    # In case of overwriting method, this section is not required
    method: append

3. Pipeline Manipulation

3.1. Forecast Pipeline

Objective: Executes models to forecast future trends or behaviors using the collected data.

YAML Parameters:

  • type: Set to forecast_pipeline to start the forecasting process.

  • pipeline_name: Specifies the pipeline to be executed.

  • train_set: Defines the training dataset.

  • prediction_set: Defines the dataset for predictions.

Example:

launch_production_forecast_pipeline:  
  type: forecast_pipeline
  params:
    pipeline_name: production
    train_set: train_dataset
    predict_set: prediction_dataset 

3.2. Optimization

Objective: Identifies the most optimal solutions or configurations using the prepared data.

YAML Parameters:

  • type: Set to optimization_pipeline to initiate the optimization process.

  • pipeline_name: Specifies the pipeline to be executed.

  • optimization_set: The dataset containing the various options or scenarios for optimization.

Example:

launch_optimization_pipeline:
  type: optimization_pipeline
  params:
    pipeline_name: Fleet_Optimization
    optimization_set: fleet_planning_system

3.3. Extracting Pipeline results as Dataset

Purpose: Verteego enables the extraction of data from different stages within a pipeline, allowing it to be stored as datasets for detailed analysis and further processing. This is done using the import_from_pipeline method.

Pipeline Stages for Data Extraction:

  • Preprocessing (forecast pipeline): Data from the initial preparation phase can be exported. This typically includes cleaned and transformed data ready for modeling.

  • Prediction (forecast pipeline): Model predictions can be exported from this stage. Users may choose to include additional analytical insights such as Shapley values, which explain the contribution of each feature to the prediction, or export all features used in the prediction model.

  • Postprocessing (forecast pipeline): Data from the final adjustments made to predictions, such as recalibrated or refined outputs, can be exported after the initial model processing.

  • Optimization (optimization pipeline): The final result of the optimization process can be exported at this stage.

YAML Parameters:

  • type: Specifies the action to be performed. For extracting pipeline data, this should be set to import_from_pipeline.

  • pipeline_name: The name of the pipeline from which the data is to be extracted.

  • pipeline_step: Specifies the stage of the pipeline from which the data will be extracted. Possible values include:

    • preprocessing

    • prediction

    • postprocessing

    • optimization

  • shapley_values (optional): A boolean parameter that determines whether to include Shapley values in the exported dataset. This applies only to the prediction stage and helps explain the influence of each feature on the predictions.

    • true – Include Shapley values.

    • false (default) – Exclude Shapley values.

  • preprocessed_columns (optional): A boolean parameter that specifies whether to include all calculated columns/features in the exported dataset. This is also relevant only for the prediction stage.

    • true – Include calculated columns/features.

    • false (default) – Exclude calculated columns/features.

shapley_values and preprocessed_columns are available only for Forecast Pipeline

Example: Extract forecast pipeline result with Shapley values and export it to another data source (Big Query)

extract_forecast_results:
  type: import_from_pipeline
  params:
    pipeline_name: sales_forecast_model
    pipeline_step: prediction
    shapley_values: true
  
  export_forecast_results:
  type: export
  params:
    data_source: VTG_raw_prediction
    dataset_name: sales_forecast_model_prediction

3.4. Model Performance and Accuracy Assessment

Purpose: Evaluates the performance and accuracy of predictive or optimization models within the system.

YAML Parameters:

  • type: Set to score to trigger the performance evaluation process.

  • pipeline_name: The name of the pipeline from which data is to be evaluated.

  • reference_dataset: Specifies the dataset used to compute the accuracy or performance metrics.

  • resource_to_evaluate: Indicates the different components or resources within the reference dataset that will be evaluated.

    • name: A label or identifier for the data being evaluated.

    • type: Defines the type of data to be evaluated

Example:

evaluate_forecast_accuracy:
  type: score
  params:
    pipeline_name: sales_forecast_model
    reference_dataset: historical_sales
    resource_to_evaluate:
        pipeline_name: sales_forecast_model
        name: sales_forecast_model_prediction
        type: dataset

4. Examples

Recipes are configured using YAML (Yet Another Markup Language), which allows for a clear and human-readable definition of each step within the recipe.

4.1. Example 1: Basic sales forecast

# ---------------------------------------------------------------- #
# -------------------- SALES HISTORY REFRESH --------------------- #
# ---------------------------------------------------------------- #
# Import sales history in Verteego
sales_history_dataset_import:
  type: import_from_dataset
  params:
    dataset_name: sales_data

# Export refreshed data in order to keep it updated
sales_history_dataset_export:
  type: export
  params:
    data_source: 1_sales
    dataset_name: sales_data

# ---------------------------------------------------------------- #
# ------------ IMPORT DATA USED IN FORECAST PIPELINE ------------- #
# ---------------------------------------------------------------- #

# Import market conditions (used as feature in forecast pipeline)
market_conditions_import:
  type: import_from_dataset
  params:
    dataset_name: market_conditions

# Import item referential (used as feature in forecast pipeline)
item_referential_import:
  type: import_from_dataset
  params:
    dataset_name: item_referential

# Import training dataset (used to train the forecast model)
train_set_import:
  type: import_from_dataset
  params:
    dataset_name: train_dataset

# Import prediction dataset (used to define couples to predict in futur)
pred_set_import:
  type: import_from_dataset
  params:
    dataset_name: prediction_dataset

# ---------------------------------------------------------------- #
# --------------------- LAUNCH PIPELINE RUN ---------------------- #
# ---------------------------------------------------------------- #


launch_sales_forecast_pipeline:
  type: forecast_pipeline
  params:
    pipeline_name: sales_forecast_model
    train_set: train_dataset
    predict_set: prediction_dataset 

extract_sales_forecast_results:
  type: import_from_pipeline
  params:
    pipeline_name: sales_forecast_model
    pipeline_step: prediction
    
export_forecast_results:
  type: export
  params:
    data_source: my_datawarehouse
    dataset_name: sales_forecast_model_prediction

evaluate_forecast_accuracy:
  type: score
  params:
    pipeline_name: sales_forecast_model
    reference_dataset: historical_sales
    resource_to_evaluate:
        pipeline_name: sales_forecast_model
        name: sales_forecast_model_prediction
        type: dataset

4.2. Example 2: Promotion Optimization

This example YAML recipe configures a two-stage process within Verteego. Initially, it forecasts the effectiveness of all possible promotional scenarios for various articles, stores, and promotional periods. Then, it optimizes to select the best promotional mechanism for each unique combination of article, store, and period based on the forecasted results.

# Import all possible promotional scenarios
import_promotional_scenarios:
  type: import_from_dataset
  params:
    dataset_name: all_promotional_scenarios

# Forecast the impact of each promotional scenario
forecast_promotional_impact:
  type: forecast_pipeline
  params:
    pipeline_name: promotion_forecast_pipeline
    input_dataset: all_promotional_scenarios
    output_dataset: forecasted_promotional_impact
    
# Extract the forecast results
  type: import_from_pipeline
  params:
    pipeline_name: promotion_forecast_pipeline
    step: postprocessing

# Optimize to select the best promotional scenario based on the forecast
optimize_promotion_selection:
  type: optimization_pipeline
  params:
    pipeline_name: promotion_optimization_pipeline
    input_dataset: promotion_forecast_pipeline_postprocessing
    output_dataset: optimal_promotion_selection

# Extract the optimization results
  type: import_from_pipeline
  params:
    pipeline_name: promotion_optimization_pipeline
    step: postprocessing
    
# Export the selected optimal promotional scenarios
export_optimal_promotions:
  type: export
  params:
    dataset_name: promotion_optimization_pipeline_postprocessing
    destination: my_datawarehouse

5. Best Practices for Recipe Configuration

  • Ensure that each step follows a logical sequence, particularly when the output of one step is required as input for the next.

  • Use clear, descriptive names for each step to improve readability and simplify troubleshooting.

  • Carefully define all parameters for each step to prevent errors during execution.

Last updated