Recipes
1. Overview
A Recipe in Verteego defines the steps required to automate complex decision-making processes. It organizes a series of operations that guide the data flow logically from one step to the next, ultimately resulting in well-informed decisions.
Available steps in Recipe are:
Importing/Extracting data
Forecasting based on the gathered data
Conducting optimization based on the gathered data
Evaluating the outcomes
Exporting the results
These different operations are defined by a YAML file.
2. Data Manipulation
2.1. Import Data
Objective: Fetches data from specified datasets within the Verteego system or from external sources.
YAML Parameters:
type
: Set toimport_from_dataset
to indicate the data import action.
Initial Setup:
Manually import data from the desired source.
Example:
In case of data import with recipe, conditions must be respected:
scheme of already imported data and new updated data are the same
data format (column types) are same
2.2. Export Data
Objective: Enables exporting processed data to various types of data sources.
YAML Parameters:
type
: Set toexport
to specify the action of data export.
Two method
of export are available:
Overwriting existing data, default method (
method: overwrite
)Appending new data to existing data with the same format/data schema (use
method: append
).
In both cases, the following information must be provided:
dataset_name
: The destination dataset for export (must be the same).data_source
: The target data source.
Taking previsous example:
3.3. Refresh Data
Objective : Allows to update existing data with recent one
YAML: Combination of two steps import_from_dataset
and export
Example:
3. Pipeline Manipulation
3.1. Forecast Pipeline
Objective: Executes models to forecast future trends or behaviors using the collected data.
YAML Parameters:
type
: Set toforecast_pipeline
to start the forecasting process.pipeline_name
: Specifies the pipeline to be executed.train_set
: Defines the training dataset.prediction_set
: Defines the dataset for predictions.
Example:
3.2. Optimization
Objective: Identifies the most optimal solutions or configurations using the prepared data.
YAML Parameters:
type
: Set tooptimization_pipeline
to initiate the optimization process.pipeline_name
: Specifies the pipeline to be executed.optimization_set
: The dataset containing the various options or scenarios for optimization.
Example:
3.3. Extracting Pipeline results as Dataset
Purpose: Verteego enables the extraction of data from different stages within a pipeline, allowing it to be stored as datasets for detailed analysis and further processing. This is done using the import_from_pipeline
method.
Pipeline Stages for Data Extraction:
Preprocessing (forecast pipeline): Data from the initial preparation phase can be exported. This typically includes cleaned and transformed data ready for modeling.
Prediction (forecast pipeline): Model predictions can be exported from this stage. Users may choose to include additional analytical insights such as Shapley values, which explain the contribution of each feature to the prediction, or export all features used in the prediction model.
Postprocessing (forecast pipeline): Data from the final adjustments made to predictions, such as recalibrated or refined outputs, can be exported after the initial model processing.
Optimization (optimization pipeline): The final result of the optimization process can be exported at this stage.
YAML Parameters:
type
: Specifies the action to be performed. For extracting pipeline data, this should be set toimport_from_pipeline
.pipeline_name
: The name of the pipeline from which the data is to be extracted.pipeline_step
: Specifies the stage of the pipeline from which the data will be extracted. Possible values include:preprocessing
prediction
postprocessing
optimization
shapley_values (optional): A boolean parameter that determines whether to include Shapley values in the exported dataset. This applies only to the prediction stage and helps explain the influence of each feature on the predictions.
true
– Include Shapley values.false
(default) – Exclude Shapley values.
preprocessed_columns (optional): A boolean parameter that specifies whether to include all calculated columns/features in the exported dataset. This is also relevant only for the prediction stage.
true
– Include calculated columns/features.false
(default) – Exclude calculated columns/features.
shapley_values
and preprocessed_columns
are available only for Forecast Pipeline
Example: Extract forecast pipeline result with Shapley values and export it to another data source (Big Query)
3.4. Model Performance and Accuracy Assessment
Purpose: Evaluates the performance and accuracy of predictive or optimization models within the system.
YAML Parameters:
type
: Set toscore
to trigger the performance evaluation process.pipeline_name
: The name of the pipeline from which data is to be evaluated.reference_dataset
: Specifies the dataset used to compute the accuracy or performance metrics.resource_to_evaluate
: Indicates the different components or resources within the reference dataset that will be evaluated.name
: A label or identifier for the data being evaluated.type
: Defines the type of data to be evaluated
Example:
4. Examples
Recipes are configured using YAML (Yet Another Markup Language), which allows for a clear and human-readable definition of each step within the recipe.
4.1. Example 1: Basic sales forecast
4.2. Example 2: Promotion Optimization
This example YAML recipe configures a two-stage process within Verteego. Initially, it forecasts the effectiveness of all possible promotional scenarios for various articles, stores, and promotional periods. Then, it optimizes to select the best promotional mechanism for each unique combination of article, store, and period based on the forecasted results.
5. Best Practices for Recipe Configuration
Ensure that each step follows a logical sequence, particularly when the output of one step is required as input for the next.
Use clear, descriptive names for each step to improve readability and simplify troubleshooting.
Carefully define all parameters for each step to prevent errors during execution.
Last updated