> ## Documentation Index
> Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Filters and Operators

> Use WHERE clauses, temporal filters, IN operators, and anchor time

Filters and operators allow you to refine your predictions by controlling which data is used and which entities are included.

## WHERE Clause (Context Filters)

The `WHERE` clause controls which **historical rows** are used as context examples for the prediction. These filters are **not** applied to the entity list itself — they only affect the context data that KumoRFM uses to generate features.

### Temporal Filters

Filter context examples based on temporal aggregations. This is useful for focusing predictions on specific subsets of historical behavior.

```sql theme={null}
-- Predict churn only for recently active users
-- (context limited to users who had an order in the last 30 days)
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=1
    WHERE COUNT(orders.*, -30, 0, days) > 0
```

```python theme={null}
query = """
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=1
    WHERE COUNT(orders.*, -30, 0, days) > 0
"""
result = model.predict(query)
```

The negative time offset (`-30, 0`) looks at the **past** 30 days relative to "now", providing a temporal filter on the context.

### Static Filters

Filter context examples based on static column values.

```sql theme={null}
-- Predict spend on electronics category only
PREDICT SUM(transactions.value, 15, 45, days)
FOR EACH customers.customer_id
    WHERE transactions.category = "Electronics"
```

<Info>
  Static column-value filtering (e.g., `WHERE users.age > 21`) is currently only supported for columns **within the same table** as the target or entity.
</Info>

## IN Clause (Entity Specification)

The `IN` clause specifies **which entities** to make predictions for. You can specify a single entity or a list of entities.

**Single entity:**

```sql theme={null}
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id=42
```

**Multiple entities:**

```sql theme={null}
PREDICT COUNT(orders.*, 0, 30, days) > 0
FOR users.user_id IN (1, 2, 3)
```

**Programmatic indices:**

You can also pass entity IDs programmatically via the `indices` parameter:

```python theme={null}
result = model.predict(
    "PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
    indices=[1, 2, 3, 4, 5],
)
```

When `indices` is provided, it overrides the entity specification in the query string. For large entity lists, use `KumoRFM.batch_mode()`.

## Logical Operators

Combine multiple conditions using `AND`, `OR`, and `NOT` operators.

### AND

Require **all** conditions to be true.

```sql theme={null}
-- Predict spend in electronics with completed transactions only
PREDICT SUM(transactions.value, 15, 45, days)
FOR EACH customers.customer_id
    WHERE transactions.category = "Electronics"
        AND transactions.status = "completed"
```

### OR

Require **any one** condition to be true. When used in the `PREDICT` clause, both operands must return the same type (e.g., both boolean for binary classification).

```sql theme={null}
-- Predict high-value customer status:
-- (spend > $100 OR more than 10 transactions)
PREDICT SUM(transactions.value, 15, 45, days) > 100
        OR COUNT(transactions.*, 15, 45, days) > 10
FOR EACH customers.customer_id
```

### NOT

Negate a condition.

```sql theme={null}
-- Exclude customers in Alaska or California from context
PREDICT SUM(transactions.value, 15, 45, days) > 100
FOR EACH customers.customer_id
    WHERE customers.location NOT IN ("ALASKA", "CALIFORNIA")
```

### Combined Example

```sql theme={null}
-- Predict high-value customer status
-- For customers active in the last 40 days and not in Alaska/California
PREDICT SUM(transactions.value, 15, 45, days) > 100
        OR COUNT(transactions.*, 15, 45, days) > 10
FOR EACH customers.customer_id
    WHERE COUNT(transactions.*, -40, 0, days) > 0
        AND customers.location NOT IN ("ALASKA", "CALIFORNIA")
```

## Anchor Time

The **anchor time** defines "now" — the reference point in time from which the prediction horizon is measured. By default, KumoRFM uses the maximum timestamp found in the data.

You can customize the anchor time via the `anchor_time` parameter in `KumoRFM.predict()` and `KumoRFM.evaluate()`:

**Default (auto-detect):**

```python theme={null}
# Uses the maximum timestamp in the data as "now"
result = model.predict(query, anchor_time=None)
```

**Explicit timestamp:**

```python theme={null}
# Make predictions as if "now" were January 1, 2024
result = model.predict(
    query,
    anchor_time=pd.Timestamp("2024-01-01"),
)
```

This is useful for **backtesting** — evaluating what the model would have predicted at a past point in time using only data available then.

**Per-entity timestamps:**

Use `anchor_time="entity"` when each entity has its own meaningful reference point in time, rather than a single shared "now". KumoRFM will use each entity's own time column value as its anchor.

```python theme={null}
result = model.predict(query, anchor_time="entity")
```

**When to use this:** Suppose you want to predict whether each user will churn in the 90 days following their sign-up date. User A signed up on January 1, user B on March 15 — you want the 90-day window to start from each user's own sign-up date, not from a single shared date. With `anchor_time="entity"`, KumoRFM uses the timestamp stored in the entity's time column as "now" for that entity.

```python theme={null}
# Each user's 90-day churn window starts from their own sign-up date
query = "PREDICT COUNT(orders.*, 0, 90, days) = 0 FOR users.user_id IN (42, 123)"
result = model.predict(query, anchor_time="entity")
```

**Context anchor time:**

By default, KumoRFM uses the same anchor time for both the prediction and for collecting the historical context examples it learns from. You can decouple these using `context_anchor_time`.

```python theme={null}
result = model.predict(
    query,
    anchor_time=pd.Timestamp("2024-06-01"),
    context_anchor_time=pd.Timestamp("2024-03-01"),
)
```

**When to use this:** Suppose you want to generate predictions for June 2024, but you are running this job in March 2024 and only have data up to that point. Setting `anchor_time` to June 1 tells KumoRFM "predict from this future date", while `context_anchor_time` set to March 1 tells it "build your context from data as it existed on March 1". This is useful for **forward-looking batch scoring** where you want predictions for a future date using only data available today.

## ASSUMING Clause

The `ASSUMING` clause lets you condition a prediction on a hypothetical state — asking "what would happen if this entity had a specific attribute value?"

```sql theme={null}
-- Predict churn for user 42, assuming they are on the premium plan
PREDICT COUNT(orders.*, 0, 90, days) = 0
FOR users.user_id=42
ASSUMING users.plan = 'premium'
```

```python theme={null}
query = (
    "PREDICT COUNT(orders.*, 0, 90, days) = 0 "
    "FOR users.user_id=42 "
    "ASSUMING users.plan = 'premium'"
)
result = model.predict(query)
```

**When to use this:** Use `ASSUMING` to answer counterfactual questions — for example, "would this customer churn if we upgraded them to premium?" or "how would revenue change if this item were marked as featured?"

<Warning>
  `ASSUMING` relies entirely on patterns in your historical data. KumoRFM has no way to make up behavior it has never seen. If the condition you are assuming (e.g., `users.plan = 'premium'`) has never been true for any entity in your historical data, there are no examples to learn from and the prediction will be unreliable. Always verify that the assumed state has sufficient historical coverage before trusting the results.
</Warning>

## Unsupported Features

The following PQL features are **not yet supported** in KumoRFM:

* Only numerical and categorical columns are valid target columns.
* Filtering by column value is only supported for columns within the same table.
