Aggregation Constraint

Overview

Aggregation Constraints in Verteego allow for the enforcement of conditions on aggregated values of variables across specified groups within your data. These constraints are vital for managing totals, averages, or other statistical measures across various dimensions of a dataset, ensuring they adhere to specified business rules or operational limits.

Applications

  • Vehicle Inventory Management: The sum of vehicles in each category must not exceed the total available from suppliers.

  • Rental Operations: Total rental days per month must stay within a predefined range of minimum and maximum days.

Parameters

  • keys: Optional. A list of column names that define the grouping for aggregation. If omitted, all selected rows are aggregated into a single value.

    • Example: Using keys=["car_model", "fuel_type"] aggregates values separately for each combination of car model and fuel type.

  • where: Optional. A dictionary of key-value pairs to filter rows before applying the constraint.

    • Example: where={"month": [1, 2, 3], "year": 2022} limits aggregation to the first quarter of 2022.

  • left_method: Specifies the aggregation method for the left side (e.g., mean, min, max, np.sum). See the Pandas DataFrame aggregation documentation for more methods.

  • left_column: Names of one or more columns to aggregate. Multiple columns result in the sum of aggregated values being constrained.

  • left_column_weight: Optional. Lists of weights to apply to each column that gets aggregated.

  • left_where: Optional. Filters rows for aggregation on the left side only.

    • Example: left_where={"bicycle_type": ["push_bike", "tandem"]} aggregates only rows concerning push bikes and tandem bikes.

  • operator: Specifies the type of constraint (e.g., equal, lesser, greater, between) between the aggregated results on the left and right sides.

  • right_method, right_column, right_column_weight, right_where: These parameters mirror those on the left side but apply to the right side of the equation.

  • relax: Indicates whether this constraint can be relaxed with a penalty. Defaults to false.

  • keep_duplicates: Determines whether to keep multiple rows that might refer to the same variables. Defaults to false.

Examples

  • Comparing Rental Prices for Different Bicycle Types

    yamlCopy codeconstraint_electric_versus_manual_bicycles:
      constraint_type: aggregation
      keys:
      - bicycle_type
      left_column: rental_price
      left_method: max
      left_where:
        bicycle_type:
          - "push_bike"
      operator: lesser
      right_column: rental_price
      right_column_weight:
        - 0.9
      right_method: min
      right_where:
        bicycle_type:
          - "e_bike"

  • Checking Hotel Occupancy Limits

    yamlCopy codehotel_occupancy_checker:
      constraint_type: aggregation
      keys:
      - hotel_category
      left_column: hotel_occupancy
      left_method: mean
      operator: between
      right_column:
        - min_avg_occupancy_per_category
        - max_avg_occupancy_per_category
      right_method: min

Last updated