words_similarity

Computes a similarity score of the words in the input column with the context words.

Usage

Calculate the similarity between each of the given words, and the words in the input column, using the sentence transformer library. Several languages are supported

This calculator can be used with the following method:

words_similarity

Examples:

  • Establish connections between various products. For instance, consider chicken soup and chicken salad. Both contain chicken.

  • Generate an artificial referential data in cases where the client's reference system is inadequate/incomplete.


Main Parameters

The bold options represent the default values when the parameters are optional.

  • input_columns Column to use to extract values which will be used to compute similarity of rows with context words.

  • output_columns Context words to use, each word will become a new column containing the computed similarity for the given row’s input column value.

  • global (true, false) Should this calculator be performed before data splitting during training for cross-validation

  • steps [optionnal] (training, prediction, postprocessing) List of steps in a pipeline where columns from this calculator are added to the data. Note that when the training option is listed, the calculator is actually added during preprocessing.

  • store_in_model [optionnal] (true, false) Please indicate whether the "calculated" columns by the calculator should be stored in the model or not to avoid recalculating them during prediction. This is only relevant if the calculated columns are added to both training and prediction. Without this parameter, the values will not be stored in the model. The following parameters only make sense if this parameter is set to true.

  • stored_columns [required if store_in_model is true] List indicating the columns to be stored among the output_columns.

  • stored_keys [required if store_in_model is true] List indicating the columns to use for identifying the correct values to join on the data for prediction among the stored values (logically, they are to be chosen from the input_columns).


Specific Parameters

  • None


Examples

  1. Extract the product label and compute the similarity of the product labels with the following context words: sushi, thon, saumon, poulet, crevette, california, rice

calculated_cols:
  words_similarity_features:
    method: words_similarity
    input_columns:
    - item_label
    output_columns:
    - sushi
    - thon
    - saumon
    - poulet
    - crevette
    - california
    - rice
item_label
sushi
thon
saumon
poulet
crevette
california
rice

CALIFORNIA SAUMON & MAKI MIXTE OLD

0.3629

0.4061

0.5522

0.4031

0.3929

0.6712

0.2669

SPICY CALIFORNIA SAUMON 9

0.3282

0.3701

0.5271

0.3991

0.4742

0.7747

0.2398

SUSHI & CALIFORNIA MIXTE 9 PIECES

0.7504

0.1815

0.3481

0.2771

0.3925

0.4621

0.291

RICE SANDWICH 8 PIECES

0.4985

0.1364

0.2212

0.3222

0.3785

0.2191

0.6828

CALIFORNIA SAUMON 8 PIECES

0.4618

0.3349

0.5578

0.4458

0.5171

0.6868

0.2998

Last updated