Recipes
1. Overview
A Recipe in Verteego defines the steps required to automate complex decision-making processes. It organizes a series of operations that guide the data flow logically from one step to the next, ultimately resulting in well-informed decisions.
Available steps in Recipe are:
Importing/Extracting data
Forecasting based on the gathered data
Conducting optimization based on the gathered data
Evaluating the outcomes
Exporting the results
These different operations are defined by a YAML file.
2. Data Manipulation
2.1. Import Data
Objective: Fetches data from specified datasets within the Verteego system or from external sources.
YAML Parameters:
type
: Set toimport_from_dataset
to indicate the data import action.
Initial Setup:
Manually import data from the desired source.
Example:
refresh_item_referential_dataset:
type: import_from_dataset
params:
dataset_name: item_referential
2.2. Export Data
Objective: Enables exporting processed data to various types of data sources.
YAML Parameters:
type
: Set toexport
to specify the action of data export.
Two method
of export are available:
Overwriting existing data, default method (
method: overwrite
)Appending new data to existing data with the same format/data schema (use
method: append
).
In both cases, the following information must be provided:
dataset_name
: The destination dataset for export (must be the same).data_source
: The target data source.
Taking previsous example:
export_item_referential:
type: export
params:
dataset_name: item_referential
data_source: 0_item
# In case of overwriting method section is not required
method: append
3.3. Refresh Data
Objective : Allows to update existing data with recent one
YAML: Combination of two steps import_from_dataset
and export
Example:
refresh_item_referential_dataset:
type: import_from_dataset
params:
dataset_name: item_referential
export_item_referential:
type: export
params:
dataset_name: item_referential
data_source: 0_item
# In case of overwriting method, this section is not required
method: append
3. Pipeline Manipulation
3.1. Forecast Pipeline
Objective: Executes models to forecast future trends or behaviors using the collected data.
YAML Parameters:
type
: Set toforecast_pipeline
to start the forecasting process.pipeline_name
: Specifies the pipeline to be executed.train_set
: Defines the training dataset.prediction_set
: Defines the dataset for predictions.
Example:
launch_production_forecast_pipeline:
type: forecast_pipeline
params:
pipeline_name: production
train_set: train_dataset
predict_set: prediction_dataset
3.2. Optimization
Objective: Identifies the most optimal solutions or configurations using the prepared data.
YAML Parameters:
type
: Set tooptimization_pipeline
to initiate the optimization process.pipeline_name
: Specifies the pipeline to be executed.optimization_set
: The dataset containing the various options or scenarios for optimization.
Example:
launch_optimization_pipeline:
type: optimization_pipeline
params:
pipeline_name: Fleet_Optimization
optimization_set: fleet_planning_system
3.3. Extracting Pipeline results as Dataset
Purpose: Verteego enables the extraction of data from different stages within a pipeline, allowing it to be stored as datasets for detailed analysis and further processing. This is done using the import_from_pipeline
method.
Pipeline Stages for Data Extraction:
Preprocessing (forecast pipeline): Data from the initial preparation phase can be exported. This typically includes cleaned and transformed data ready for modeling.
Prediction (forecast pipeline): Model predictions can be exported from this stage. Users may choose to include additional analytical insights such as Shapley values, which explain the contribution of each feature to the prediction, or export all features used in the prediction model.
Postprocessing (forecast pipeline): Data from the final adjustments made to predictions, such as recalibrated or refined outputs, can be exported after the initial model processing.
Optimization (optimization pipeline): The final result of the optimization process can be exported at this stage.
YAML Parameters:
type
: Specifies the action to be performed. For extracting pipeline data, this should be set toimport_from_pipeline
.pipeline_name
: The name of the pipeline from which the data is to be extracted.pipeline_step
: Specifies the stage of the pipeline from which the data will be extracted. Possible values include:preprocessing
prediction
postprocessing
optimization
shapley_values (optional): A boolean parameter that determines whether to include Shapley values in the exported dataset. This applies only to the prediction stage and helps explain the influence of each feature on the predictions.
true
– Include Shapley values.false
(default) – Exclude Shapley values.
preprocessed_columns (optional): A boolean parameter that specifies whether to include all calculated columns/features in the exported dataset. This is also relevant only for the prediction stage.
true
– Include calculated columns/features.false
(default) – Exclude calculated columns/features.
shapley_values
and preprocessed_columns
are available only for Forecast Pipeline
Example: Extract forecast pipeline result with Shapley values and export it to another data source (Big Query)
extract_forecast_results:
type: import_from_pipeline
params:
pipeline_name: sales_forecast_model
pipeline_step: prediction
shapley_values: true
export_forecast_results:
type: export
params:
data_source: VTG_raw_prediction
dataset_name: sales_forecast_model_prediction
3.4. Model Performance and Accuracy Assessment
Purpose: Evaluates the performance and accuracy of predictive or optimization models within the system.
YAML Parameters:
type
: Set toscore
to trigger the performance evaluation process.pipeline_name
: The name of the pipeline from which data is to be evaluated.reference_dataset
: Specifies the dataset used to compute the accuracy or performance metrics.resource_to_evaluate
: Indicates the different components or resources within the reference dataset that will be evaluated.name
: A label or identifier for the data being evaluated.type
: Defines the type of data to be evaluated
Example:
evaluate_forecast_accuracy:
type: score
params:
pipeline_name: sales_forecast_model
reference_dataset: historical_sales
resource_to_evaluate:
pipeline_name: sales_forecast_model
name: sales_forecast_model_prediction
type: dataset
4. Examples
Recipes are configured using YAML (Yet Another Markup Language), which allows for a clear and human-readable definition of each step within the recipe.
4.1. Example 1: Basic sales forecast
# ---------------------------------------------------------------- #
# -------------------- SALES HISTORY REFRESH --------------------- #
# ---------------------------------------------------------------- #
# Import sales history in Verteego
sales_history_dataset_import:
type: import_from_dataset
params:
dataset_name: sales_data
# Export refreshed data in order to keep it updated
sales_history_dataset_export:
type: export
params:
data_source: 1_sales
dataset_name: sales_data
# ---------------------------------------------------------------- #
# ------------ IMPORT DATA USED IN FORECAST PIPELINE ------------- #
# ---------------------------------------------------------------- #
# Import market conditions (used as feature in forecast pipeline)
market_conditions_import:
type: import_from_dataset
params:
dataset_name: market_conditions
# Import item referential (used as feature in forecast pipeline)
item_referential_import:
type: import_from_dataset
params:
dataset_name: item_referential
# Import training dataset (used to train the forecast model)
train_set_import:
type: import_from_dataset
params:
dataset_name: train_dataset
# Import prediction dataset (used to define couples to predict in futur)
pred_set_import:
type: import_from_dataset
params:
dataset_name: prediction_dataset
# ---------------------------------------------------------------- #
# --------------------- LAUNCH PIPELINE RUN ---------------------- #
# ---------------------------------------------------------------- #
launch_sales_forecast_pipeline:
type: forecast_pipeline
params:
pipeline_name: sales_forecast_model
train_set: train_dataset
predict_set: prediction_dataset
extract_sales_forecast_results:
type: import_from_pipeline
params:
pipeline_name: sales_forecast_model
pipeline_step: prediction
export_forecast_results:
type: export
params:
data_source: my_datawarehouse
dataset_name: sales_forecast_model_prediction
evaluate_forecast_accuracy:
type: score
params:
pipeline_name: sales_forecast_model
reference_dataset: historical_sales
resource_to_evaluate:
pipeline_name: sales_forecast_model
name: sales_forecast_model_prediction
type: dataset
4.2. Example 2: Promotion Optimization
This example YAML recipe configures a two-stage process within Verteego. Initially, it forecasts the effectiveness of all possible promotional scenarios for various articles, stores, and promotional periods. Then, it optimizes to select the best promotional mechanism for each unique combination of article, store, and period based on the forecasted results.
# Import all possible promotional scenarios
import_promotional_scenarios:
type: import_from_dataset
params:
dataset_name: all_promotional_scenarios
# Forecast the impact of each promotional scenario
forecast_promotional_impact:
type: forecast_pipeline
params:
pipeline_name: promotion_forecast_pipeline
input_dataset: all_promotional_scenarios
output_dataset: forecasted_promotional_impact
# Extract the forecast results
type: import_from_pipeline
params:
pipeline_name: promotion_forecast_pipeline
step: postprocessing
# Optimize to select the best promotional scenario based on the forecast
optimize_promotion_selection:
type: optimization_pipeline
params:
pipeline_name: promotion_optimization_pipeline
input_dataset: promotion_forecast_pipeline_postprocessing
output_dataset: optimal_promotion_selection
# Extract the optimization results
type: import_from_pipeline
params:
pipeline_name: promotion_optimization_pipeline
step: postprocessing
# Export the selected optimal promotional scenarios
export_optimal_promotions:
type: export
params:
dataset_name: promotion_optimization_pipeline_postprocessing
destination: my_datawarehouse
5. Best Practices for Recipe Configuration
Ensure that each step follows a logical sequence, particularly when the output of one step is required as input for the next.
Use clear, descriptive names for each step to improve readability and simplify troubleshooting.
Carefully define all parameters for each step to prevent errors during execution.
Last updated