Filters and operators allow you to refine your predictions by controlling which data is used and which entities are included.
Placeholder: Filter concepts overview diagram
WHERE Clause (Context Filters)
The WHERE clause controls which historical rows are used as context examples for the prediction. These filters are not applied to the entity list itself — they only affect the context data that KumoRFM uses to generate features.
Temporal Filters
Filter context examples based on temporal aggregations. This is useful for focusing predictions on specific subsets of historical behavior.
-- Predict churn only for recently active users
-- (context limited to users who had an order in the last 30 days)
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=1
WHERE COUNT(orders.*, -30, 0, days) > 0
query = """
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=1
WHERE COUNT(orders.*, -30, 0, days) > 0
"""
result = model.predict(query)
The negative time offset (-30, 0) looks at the past 30 days relative to “now”, providing a temporal filter on the context.
Static Filters
Filter context examples based on static column values.
-- Predict spend on electronics category only
PREDICT SUM(transactions.value, 15, 45, days)
FOR EACH customers.customer_id
WHERE transactions.category = "Electronics"
Static column-value filtering (e.g., WHERE users.age > 21) is currently only supported for columns within the same table as the target or entity.
IN Clause (Entity Specification)
The IN clause specifies which entities to make predictions for. You can specify a single entity or a list of entities.
Single entity:
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=42
Multiple entities:
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id IN (1, 2, 3)
Programmatic indices:
You can also pass entity IDs programmatically via the indices parameter:
result = model.predict(
"PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
indices=[1, 2, 3, 4, 5],
)
When indices is provided, it overrides the entity specification in the query string. For large entity lists, use KumoRFM.batch_mode().
Logical Operators
Combine multiple conditions using AND, OR, and NOT operators.
AND
Require all conditions to be true.
-- Predict spend in electronics with completed transactions only
PREDICT SUM(transactions.value, 15, 45, days)
FOR EACH customers.customer_id
WHERE transactions.category = "Electronics"
AND transactions.status = "completed"
Require any one condition to be true. When used in the PREDICT clause, both operands must return the same type (e.g., both boolean for binary classification).
-- Predict high-value customer status:
-- (spend > $100 OR more than 10 transactions)
PREDICT SUM(transactions.value, 15, 45, days) > 100
OR COUNT(transactions.*, 15, 45, days) > 10
FOR EACH customers.customer_id
NOT
Negate a condition.
-- Exclude customers in Alaska or California from context
PREDICT SUM(transactions.value, 15, 45, days) > 100
FOR EACH customers.customer_id
WHERE customers.location NOT IN ("ALASKA", "CALIFORNIA")
Combined Example
-- Predict high-value customer status
-- For customers active in the last 40 days and not in Alaska/California
PREDICT SUM(transactions.value, 15, 45, days) > 100
OR COUNT(transactions.*, 15, 45, days) > 10
FOR EACH customers.customer_id
WHERE COUNT(transactions.*, -40, 0, days) > 0
AND customers.location NOT IN ("ALASKA", "CALIFORNIA")
Anchor Time
The anchor time defines “now” — the reference point in time from which the prediction horizon is measured. By default, KumoRFM uses the maximum timestamp found in the data.
Placeholder: Anchor time diagram showing how it interacts with the prediction horizon and context window
You can customize the anchor time via the anchor_time parameter in KumoRFM.predict() and `KumoRFM.evaluate()`:
Default (auto-detect):
# Uses the maximum timestamp in the data as "now"
result = model.predict(query, anchor_time=None)
Explicit timestamp:
# Make predictions as if "now" were January 1, 2024
result = model.predict(
query,
anchor_time=pd.Timestamp("2024-01-01"),
)
This is useful for backtesting — evaluating what the model would have predicted at a past point in time using only data available then.
Per-entity timestamps:
Use anchor_time="entity" when each entity has its own meaningful reference point in time, rather than a single shared “now”. KumoRFM will use each entity’s own time column value as its anchor.
result = model.predict(query, anchor_time="entity")
When to use this: Suppose you want to predict whether each user will churn in the 90 days following their sign-up date. User A signed up on January 1, user B on March 15 — you want the 90-day window to start from each user’s own sign-up date, not from a single shared date. With anchor_time="entity", KumoRFM uses the timestamp stored in the entity’s time column as “now” for that entity.
# Each user's 90-day churn window starts from their own sign-up date
query = "PREDICT COUNT(orders.*, 0, 90, days) = 0 FOR users.user_id IN (42, 123)"
result = model.predict(query, anchor_time="entity")
Context anchor time:
By default, KumoRFM uses the same anchor time for both the prediction and for collecting the historical context examples it learns from. You can decouple these using context_anchor_time.
result = model.predict(
query,
anchor_time=pd.Timestamp("2024-06-01"),
context_anchor_time=pd.Timestamp("2024-03-01"),
)
When to use this: Suppose you want to generate predictions for June 2024, but you are running this job in March 2024 and only have data up to that point. Setting anchor_time to June 1 tells KumoRFM “predict from this future date”, while context_anchor_time set to March 1 tells it “build your context from data as it existed on March 1”. This is useful for forward-looking batch scoring where you want predictions for a future date using only data available today.
ASSUMING Clause
The ASSUMING clause lets you condition a prediction on a hypothetical state — asking “what would happen if this entity had a specific attribute value?”
-- Predict churn for user 42, assuming they are on the premium plan
PREDICT COUNT(orders.*, 0, 90, days) = 0
FOR users.user_id=42
ASSUMING users.plan = 'premium'
query = (
"PREDICT COUNT(orders.*, 0, 90, days) = 0 "
"FOR users.user_id=42 "
"ASSUMING users.plan = 'premium'"
)
result = model.predict(query)
When to use this: Use ASSUMING to answer counterfactual questions — for example, “would this customer churn if we upgraded them to premium?” or “how would revenue change if this item were marked as featured?”
ASSUMING relies entirely on patterns in your historical data. KumoRFM has no way to make up behaviour it has never seen. If the condition you are assuming (e.g., users.plan = 'premium') has never been true for any entity in your historical data, there are no examples to learn from and the prediction will be unreliable. Always verify that the assumed state has sufficient historical coverage before trusting the results.
Unsupported Features
The following PQL features are not yet supported in KumoRFM:
- Only numerical and categorical columns are valid target columns.
- Filtering by column value is only supported for columns within the same table.