Skip to main content
RelBench is an open benchmark for machine learning over relational databases. It provides realistic, large-scale datasets with standardized train, validation, and test splits, along with unified task definitions and evaluation metrics.
KumoRFM training did not incorporate datasets from RelBench. The examples on this page evaluate KumoRFM on RelBench tasks without prior model training on RelBench benchmark data.
For KumoRFM users, RelBench is useful in two ways:
  • it gives you benchmark datasets that are already structured for relational prediction
  • it provides task definitions that make it easy to compare KumoRFM against other methods

What the RelBench Notebook Covers

The RelBench notebook shows how to evaluate KumoRFM on RelBench tasks in two different ways:
  1. with a custom context table that turns a RelBench task into a missing value imputation problem
  2. with Predictive Query Language (PQL) that reproduces the task directly in KumoRFM
The notebook uses the rel-f1 dataset as the main worked example and the driver-top3 task as the primary classification task.

What RelBench Includes

RelBench includes a growing set of realistic relational databases across domains such as sports, e-commerce, medicine, social platforms, and scientific publishing. The website highlights that the benchmark includes diverse databases, standardized evaluators, and automated data loading. The notebook focuses on RelBench as an evaluation framework rather than a single dataset. It demonstrates how the same KumoRFM workflow can transfer across multiple RelBench tasks.

Notebook Workflow

At a high level, the notebook does the following:
  1. Installs the required packages, including kumoai and RelBench.
  2. Authenticates with KumoRFM and initializes the SDK.
  3. Imports RelBench dataset helpers.
  4. Defines a helper function that converts any RelBench dataset into a LocalGraph.
  5. Loads the rel-f1 dataset and builds its graph.
  6. Prints metadata, links, and a visualization to verify the graph.
  7. Demonstrates the custom context table approach for the driver-top3 task.
  8. Builds a context table by concatenating split-specific task tables and masking test labels.
  9. Adds the context table to the graph and links it to the task entity table.
  10. Runs a KumoRFM prediction query against that context table.
  11. Evaluates the predictions with AUROC.
  12. Demonstrates the PQL approach for the same task.
  13. Uses model.get_train_table(...) to debug and validate label generation.
  14. Groups test entities by anchor timestamp and predicts in batches.
  15. Evaluates the final predictions and compares the setup with the context table approach.

Two Evaluation Approaches

The notebook explicitly compares two ways to evaluate KumoRFM on RelBench tasks.

Custom Context Table

This approach creates a dedicated context table that contains:
  • the entity identifier
  • the anchor timestamp
  • the target label
Training labels are kept, while test labels are masked. KumoRFM then solves the task as a missing value imputation problem. This is useful when:
  • you want a general recipe that works across many task types
  • you want direct control over the context rows and labels
  • you are evaluating a custom task that does not already have a clean PQL form

Predictive Query Language

The notebook also shows how to express RelBench tasks directly in PQL. For the driver-top3 task on rel-f1, the notebook uses:
query = (
    "PREDICT MIN(qualifying.position, 0, 30, days)<=3 "
    "FOR drivers.driverId IN ({indices})"
)
This form is especially useful when a task has a natural temporal definition, because it keeps the task specification close to the business question itself.

Practical Notes from the Notebook

The RelBench notebook highlights a few operational details that matter in practice:
  • run_mode='best' can improve benchmark performance when runtime is less important than accuracy
  • anchor_time='entity' is useful for context-table workflows where each row carries its own timestamp
  • batching by shared anchor timestamp helps KumoRFM reuse context efficiently
  • max_pq_iterations may need to be increased for more restrictive queries so that enough valid context examples are found

More Reading