Predictive Query Language (PQL) is a declarative query language that lets you define predictive problems on relational data. PQL specifies:
- The target — what you want to predict (an aggregation or column value)
- The entity — which rows/IDs to predict for
- The horizon — the future time window to predict over (for temporal tasks)
For the full thorough introduction to predictive query, please refer to the predictive query tutorial.
KumoRFM is currently in experimental phase. Some PQL features are not fully supported yet.
Target x Entity x Horizon
The core framework for every KumoRFM prediction is Target x Entity x Horizon:
(Placeholder: Diagram showing how Target, Entity, and Horizon combine to define a prediction.)
- Target: The value to predict — either an aggregation over related rows (e.g.,
COUNT(orders.*, 0, 30, days)) or a static column value (e.g., users.age).
- Entity: The specific row(s) to predict for, identified by a table’s primary key (e.g.,
users.user_id=1).
- Horizon: For temporal predictions, the future time window (e.g.,
0, 30, days means “the next 30 days from now”).
PQL Structure
The general PQL structure is:
PREDICT <target_expression> FOR <entity_specification> WHERE <optional_filters>
| Component | Purpose |
|---|
PREDICT <target_expression> | Declares the value or aggregate the model should predict |
FOR <entity_specification> | Specifies the single ID or list of IDs to predict for |
WHERE <filters> (optional) | Filters which historical rows are used as context |
Five Steps to Write a PQL Query
- Choose your entity — pick a table and its primary key to predict for.
- Define the target — a raw column or an aggregation over a future window.
- Pin the entity list — pass a single ID or multiple IDs.
- (Optional) Refine the context — add filters to restrict which historical rows are used for feature generation.
- Run & fetch — call
KumoRFM.predict() or KumoRFM.evaluate().
Entity Specification
Unlike the fine-tuning mode, KumoRFM makes predictions for a handful of selected entities at a time. Entities can be specified in three ways:
- Single ID:
users.user_id=1
- Tuple of IDs:
users.user_id IN (1, 2, 3)
- Programmatic list via the
indices parameter:
result = model.predict(
"PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
indices=[1, 2, 3, 4, 5],
)
Example Queries
Temporal regression — predict total spend in the next 30 days:
PREDICT SUM(orders.price, 0, 30, days) FOR users.user_id=42
Binary classification — will a user churn (no orders in 90 days)?
PREDICT COUNT(orders.*, 0, 90, days) = 0 FOR users.user_id=42
Static prediction — predict a user’s age from relational context:
PREDICT users.age FOR users.user_id=42
Multi-horizon forecasting — predict weekly revenue over 8 weeks:
PREDICT SUM(orders.price, 0, 7, days) FORECAST 8 TIMEFRAMES FOR items.item_id=42
See prediction_types for a complete reference of all supported task types.
Unsupported Features
Due to the experimental nature of KumoRFM, some PQL features are not yet fully supported:
LIST_DISTINCT() without a time interval is not supported.
- Filtering by column value (e.g.,
WHERE users.age > 21) is only supported for columns within the same table.
- Predicting a single non-aggregated value (e.g.,
PREDICT users.age) only works for columns within the entity table.
Further Reading
prediction_types — all supported task types with PQL examples
filters_and_operators — WHERE, IN, logical operators, anchor time
evaluation — automatic evaluation and metrics
configuration — run modes, explainability, batch mode, retry