Predictive Query Language (PQL) is a declarative query language that lets you define predictive problems on relational data. PQL specifies:
- The target — what you want to predict (an aggregation or column value)
- The entity — which rows/IDs to predict for
- The horizon — the future time window to predict over (for temporal tasks)
For a thorough introduction to predictive queries, please refer to the predictive query tutorial.
KumoRFM is currently in experimental phase. Some PQL features are not fully supported yet.
Target x Entity x Horizon
The core framework for every KumoRFM prediction is Target x Entity x Horizon:
- Target: The value to predict — either an aggregation over related rows (e.g.,
COUNT(orders.*, 0, 30, days)) or a static column value (e.g., users.age).
- Entity: The specific row(s) to predict for, identified by a table’s primary key (e.g.,
users.user_id=1).
- Horizon: For temporal predictions, the future time window (e.g.,
0, 30, days means “the next 30 days from now”).
PQL Structure
The general PQL structure is:
PREDICT <target_expression> FOR <entity_specification> WHERE <optional_filters>
| Component | Purpose |
|---|
PREDICT <target_expression> | Declares the value or aggregate the model should predict |
FOR <entity_specification> | Specifies the single ID or list of IDs to predict for |
WHERE <filters> (optional) | Filters which historical rows are used as context |
Five Steps to Write a PQL Query
- Choose your entity — pick a table and its primary key to predict for.
- Define the target — a raw column or an aggregation over a future window.
- Pin the entity list — pass a single ID or multiple IDs.
- (Optional) Refine the context — add filters to restrict which historical rows are used for feature generation.
- Run & fetch — call
KumoRFM.predict() or KumoRFM.evaluate().
Entity Specification
Unlike the fine-tuning mode, KumoRFM makes predictions for a handful of selected entities at a time. Entities can be specified in three ways:
- Single ID:
users.user_id=1
- Tuple of IDs:
users.user_id IN (1, 2, 3)
- Programmatic list via the
indices parameter:
result = model.predict(
"PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
indices=[1, 2, 3, 4, 5],
)
Example Queries
Temporal regression — predict total spend in the next 30 days:
PREDICT SUM(orders.price, 0, 30, days) FOR users.user_id=42
Binary classification — will a user churn (no orders in 90 days)?
PREDICT COUNT(orders.*, 0, 90, days) = 0 FOR users.user_id=42
Static prediction — predict a user’s age from relational context:
PREDICT users.age FOR users.user_id=42
Multi-horizon forecasting — predict weekly revenue over 8 weeks:
PREDICT SUM(orders.price, 0, 7, days) FORECAST 8 TIMEFRAMES FOR items.item_id=42
See prediction_types for a complete reference of all supported task types.
Unsupported Features
Due to the experimental nature of KumoRFM, some PQL features are not yet fully supported:
LIST_DISTINCT() without a time interval is not supported.
- Filtering by column value (e.g.,
WHERE users.age > 21) is only supported for columns within the same table.
- Predicting a single non-aggregated value (e.g.,
PREDICT users.age) only works for columns within the entity table.
Further Reading
prediction_types — all supported task types with PQL examples
filters_and_operators — WHERE, IN, logical operators, anchor time
evaluation — automatic evaluation and metrics
configuration — run modes, explainability, batch mode, retry