pca

Performs Principal Component Analysis (PCA).

Usage

Principal Component Analysis.

is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed.

This calculator can be used with the following method:

pca

Examples:

  • often use combined with tsfresh in order to reduce an important number of generated features.


Main Parameters

The bold options represent the default values when the parameters are optional.

  • input_columns list of columns used as input of the calculators ⇒ Names of the columns to use for reduction. Note that you can use wildcards in columns name

  • output_columns_prefix prefix of the columns added when the output columns cannot be listed ⇒ Prefix to use for the output columns, as this calculator adds several

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • n_components define the number of components to be kept at the end


Examples

  1. Here we want to summarize multiple temoral features (ts.*) in only 5 components.

calculated_cols:
  compute_pca:
    method: pca
    input_columns:
      - ts.*
    output_columns_prefix: pca_tsf
    store_in_model: true
    stored_keys:
      - item_id
      - warehouse_id
      - country
    stored_columns:
      - pca_tsf.*
    params:
      n_components: 5
    default_value:
      pca_tsf_0: 0
      pca_tsf_1: 0
      pca_tsf_2: 0
      pca_tsf_3: 0
      pca_tsf_4: 0

Last updated