Skip to main content
Predictive Query Language (PQL) is a declarative query language that lets you define predictive problems on relational data. PQL specifies:
  1. The target — what you want to predict (an aggregation or column value)
  2. The entity — which rows/IDs to predict for
  3. The horizon — the future time window to predict over (for temporal tasks)
For a thorough introduction to predictive queries, please refer to the predictive query tutorial.
KumoRFM is currently in experimental phase. Some PQL features are not fully supported yet.

Target x Entity x Horizon

The core framework for every KumoRFM prediction is Target x Entity x Horizon: Target x Entity x Horizon framework
  • Target: The value to predict — either an aggregation over related rows (e.g., COUNT(orders.*, 0, 30, days)) or a static column value (e.g., users.age).
  • Entity: The specific row(s) to predict for, identified by a table’s primary key (e.g., users.user_id=1).
  • Horizon: For temporal predictions, the future time window (e.g., 0, 30, days means “the next 30 days from now”).

PQL Structure

The general PQL structure is:
PREDICT <target_expression> FOR <entity_specification> WHERE <optional_filters>
ComponentPurpose
PREDICT <target_expression>Declares the value or aggregate the model should predict
FOR <entity_specification>Specifies the single ID or list of IDs to predict for
WHERE <filters> (optional)Filters which historical rows are used as context

Five Steps to Write a PQL Query

  1. Choose your entity — pick a table and its primary key to predict for.
  2. Define the target — a raw column or an aggregation over a future window.
  3. Pin the entity list — pass a single ID or multiple IDs.
  4. (Optional) Refine the context — add filters to restrict which historical rows are used for feature generation.
  5. Run & fetch — call KumoRFM.predict() or KumoRFM.evaluate().

Entity Specification

Unlike the fine-tuning mode, KumoRFM makes predictions for a handful of selected entities at a time. Entities can be specified in three ways:
  • Single ID: users.user_id=1
  • Tuple of IDs: users.user_id IN (1, 2, 3)
  • Programmatic list via the indices parameter:
result = model.predict(
    "PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
    indices=[1, 2, 3, 4, 5],
)

Example Queries

Temporal regression — predict total spend in the next 30 days:
PREDICT SUM(orders.price, 0, 30, days) FOR users.user_id=42
Binary classification — will a user churn (no orders in 90 days)?
PREDICT COUNT(orders.*, 0, 90, days) = 0 FOR users.user_id=42
Static prediction — predict a user’s age from relational context:
PREDICT users.age FOR users.user_id=42
Multi-horizon forecasting — predict weekly revenue over 8 weeks:
PREDICT SUM(orders.price, 0, 7, days) FORECAST 8 TIMEFRAMES FOR items.item_id=42
See prediction_types for a complete reference of all supported task types.

Unsupported Features

Due to the experimental nature of KumoRFM, some PQL features are not yet fully supported:
  • LIST_DISTINCT() without a time interval is not supported.
  • Filtering by column value (e.g., WHERE users.age > 21) is only supported for columns within the same table.
  • Predicting a single non-aggregated value (e.g., PREDICT users.age) only works for columns within the entity table.

Further Reading

  • prediction_types — all supported task types with PQL examples
  • filters_and_operators — WHERE, IN, logical operators, anchor time
  • evaluation — automatic evaluation and metrics
  • configuration — run modes, explainability, batch mode, retry