Verteego Doc
  • Getting started
    • About Verteego
    • Sample Use Cases
    • Concepts
  • Data
    • Introduction
    • Datasources
      • URL connector specification
    • Datasets
  • Pipelines
    • Forecasting Pipelines
      • Getting started
      • Configuration
        • Identifying and preparing data
          • calculated_cols
          • cols_type
          • date_col
          • normalize
          • preprocessing
          • prediction_resolution
        • Configuring the Forecasting Algorithm
          • algo_name
          • algo_type
          • algorithm_parameters
          • fit_parameters
          • conditional_algorithm_parameters
        • Building the Training and Prediction Set
          • column_to_predict
          • features
          • input_prediction_columns
        • Using Hyperparameter Tuning for the Model
          • tuning_search_params
          • hyperparameter_tuning_parameters
        • Evaluating the Results of the Forecast
          • scores
        • Modifying the results of the forecast
          • postprocessing
      • Calculators
        • External source
          • get_from_dataset
          • weather
        • Mathematic
          • aggregate_val_group_by_key
          • binary_operation
          • count_rows_by_keys
          • hierarchical_aggregate
          • mathematical_expression
          • unary_operation
          • Moving Average (EWM)
        • Machine Learning
          • pca
          • clustering
          • glmm_encoder
          • one_hot_encode
          • words_similarity
        • Transformation
          • fillna
          • fill_series
          • case_na
          • interval_index
          • constant
          • cyclic
          • replace
        • Temporal
          • bank_holidays_countdown
          • bankholidays
          • date_attributes
          • date_weight
          • day_count
          • duration
          • events_countdown
          • seasonality
          • tsfresh
    • Optimization Pipelines
      • Getting started
      • Configuration
      • Constraints
        • Unary Constraint
        • Binary Constraint
        • Aggregation Constraint
        • Order Constraint
        • Multiplicative Equality Constraint
        • Generate constraints from a Dataset
  • Apps
    • About Apps
    • Recipes
      • Pipelines
      • Datasets
  • Users
    • User roles
  • Best practices
    • Performance analysis and ML model improvement
  • Developers
    • API
    • Change logs
Powered by GitBook
On this page
  • column_to_predict
  • features
  • input_prediction_columns
  1. Pipelines
  2. Forecasting Pipelines
  3. Configuration

Building the Training and Prediction Set

Previousconditional_algorithm_parametersNextcolumn_to_predict

Last updated 1 year ago

Specifies the name of the column that the model aims to predict. This is a critical configuration as it defines the target variable for the forecasting algorithm. If this column already exists in the prediction dataset, it will serve as a benchmark to compare against the model’s predictions, enabling the computation of performance scores such as accuracy or mean squared error.

This function identifies the columns to be used by the model for the forecasting task. It selects from among the columns defined in the cols_type and those generated in the calculated_cols section. Excluded from selection are the column_to_predict and any columns derived directly from it, to avoid data leakage and ensure model integrity. This feature selection is crucial, as it focuses the model's learning on relevant predictors while excluding the target variable and its derivatives. To guarantee a robust model, at least one subcategory of features must be included, ensuring the set of features is comprehensive and not empty.

Lists the specific columns from the Prediction input file that should be included during the prediction process. If this list is left empty, the model will consider all available columns in the dataset. Specifying particular columns can streamline the prediction process, ensuring that the model only processes relevant data, which can be particularly useful for reducing complexity and improving performance in datasets with a large number of features.

column_to_predict
features
input_prediction_columns