> ## Documentation Index
> Fetch the complete documentation index at: https://kumo.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluation

> Evaluate KumoRFM predictions with metrics and task tables

`KumoRFM` provides an evaluation mode that automatically measures prediction quality by performing a train/test split on context examples and computing relevant metrics.

## Running an Evaluation

Use `KumoRFM.evaluate()` with the same PQL syntax as \`KumoRFM.predict()\`:

```python theme={null}
metrics = model.evaluate(
    "PREDICT COUNT(orders.*, 0, 30, days) > 0 FOR users.user_id=1",
    run_mode="fast",
)
print(metrics)
```

The evaluation collects context examples, splits them into in-context (training) and test sets, generates predictions for the test set, and computes metrics comparing predictions to actual outcomes.

You can also use the `EVALUATE` keyword in the query string directly:

```python theme={null}
metrics = model.evaluate(
    "EVALUATE PREDICT COUNT(orders.*, 0, 30, days) FOR users.user_id=1"
)
```

## Available Metrics

The metrics returned depend on the detected task type:

| Task Type                  | Supported Metrics                                                            |
| -------------------------- | ---------------------------------------------------------------------------- |
| Binary Classification      | `acc`, `precision`, `recall`, `f1`, `auroc`, `auprc`, `ap`                   |
| Multi-Class Classification | `acc`, `precision`, `recall`, `f1`, `mrr`                                    |
| Regression / Forecasting   | `mae`, `mape`, `mse`, `rmse`, `smape`, `r2`                                  |
| Temporal Link Prediction   | `map@k`, `ndcg@k`, `mrr@k`, `precision@k`, `recall@k`, `f1@k`, `hit_ratio@k` |

You can specify which metrics to compute:

```python theme={null}
metrics = model.evaluate(
    "PREDICT SUM(orders.price, 0, 30, days) FOR items.item_id=42",
    metrics=["mae", "rmse", "r2"],
)
```

## Evaluation Parameters

The `KumoRFM.evaluate()` method accepts the same parameters as `KumoRFM.predict()`, plus:

* `metrics`: A list of metric names to compute. If not specified, all applicable metrics for the task type are computed.

The `run_mode`, `anchor_time`, `num_hops`, and other parameters work identically to `KumoRFM.predict()`. See `configuration` for details on run modes.

## Evaluation with TaskTable

For advanced use cases, you can construct a `TaskTable` explicitly and use \`KumoRFM.evaluate\_task()\`:

```python theme={null}
from kumoai.rfm import TaskTable

task = TaskTable(
    task_type="binary_classification",
    context_df=context_dataframe,
    pred_df=prediction_dataframe,
    entity_table_name="users",
    entity_column="user_id",
    target_column="target",
    time_column="timestamp",
)

metrics = model.evaluate_task(task)
```

This gives you full control over the train/test split and context construction.

## Interpreting Results

The evaluation returns a `pandas.DataFrame` with `metric` and `value` columns:

```python theme={null}
>>> metrics = model.evaluate(query)
>>> print(metrics)
  metric  value
0    mae   12.5
1   rmse   15.3
2     r2   0.82
```

Higher values are better for `r2`, `acc`, `auroc`, `auprc`, `ap`, `precision`, `recall`, and `f1`. Lower values are better for `mae`, `mape`, `mse`, `rmse`, `smape`.
