Skip to main content
The SALT notebook shows how to evaluate KumoRFM on a realistic enterprise dataset that has now been merged into RelBench. SALT, short for Sales Autocompletion Linked Business Tables, was introduced by SAP, and this dataset page builds on SAP’s original research and dataset release.
KumoRFM training did not incorporate the SALT dataset. The examples on this page evaluate KumoRFM on SALT tasks without prior model training on SALT data.

What SALT Is

SALT, short for Sales Autocompletion Linked Business Tables, is a real enterprise resource planning (ERP) dataset built around sales-order workflows. It contains four linked business tables:
  • sales documents
  • sales document items
  • customers
  • addresses
Together, these tables contain roughly 5 million anonymized records and are designed to reflect real enterprise data problems such as missing fields, temporal drift, label imbalance, and heterogeneous relational structure.

What KumoRFM Predicts on SALT

SALT is not primarily a regression or binary classification benchmark. Instead, the main SALT tasks are multi-class classification problems where KumoRFM fills in missing business attributes. The notebook focuses on eight categorical prediction targets:
  • SALESOFFICE
  • SALESGROUP
  • CUSTOMERPAYMENTTERMS
  • SHIPPINGCONDITION
  • HEADERINCOTERMSCLASSIFICATION
  • PLANT
  • SHIPPINGPOINT
  • ITEMINCOTERMSCLASSIFICATION
In practice, this means KumoRFM can help predict missing values for sales-order headers and line items, including sales organization details, payment terms, shipping settings, plant assignments, and Incoterms classifications.

What the SALT Notebook Does

The SALT notebook is an evaluation walkthrough. It shows how to take a realistic enterprise dataset, convert it into a Kumo graph, and run classification tasks using Predictive Query Language. At a high level, the notebook does the following:
  1. Installs the required packages, including kumoai and Hugging Face dataset tooling.
  2. Authenticates with KumoRFM and initializes the SDK.
  3. Authenticates with Hugging Face to access the SALT dataset.
  4. Loads the SALT tables into pandas DataFrames.
  5. Reconstructs the raw relational dataset by concatenating train and test splits where needed.
  6. Cleans and normalizes the data:
    • merges CREATIONDATE and CREATIONTIME into a single datetime field
    • propagates timestamps to item rows
    • removes auto-generated index columns
    • adds a stable primary key for item-level tasks
    • renames overlapping target columns for clarity
  7. Selects one target task at a time and removes the remaining target columns from the feature set.
  8. Masks the test labels to prevent temporal leakage.
  9. Builds a LocalGraph with explicit primary keys, time columns, and links across sales, items, customers, and addresses.
  10. Initializes KumoRFM on that graph.
  11. Defines a Predictive Query Language statement for the chosen target.
  12. Runs batched predictions on the held-out entities.
  13. Evaluates the predictions using Mean Reciprocal Rank (MRR).

KumoRFM Prediction Pattern

For SALT-style tasks, KumoRFM predicts a categorical target for each entity in the relevant table. In the notebook, the PQL takes one of these forms:
query = f"PREDICT sales.{task} FOR EACH sales.SALESDOCUMENT"
or:
query = f"PREDICT items.{task} FOR EACH items.ID"
This pattern is useful whenever you want to model enterprise data completion tasks as relational prediction problems.

More Reading

The SALT repository notes that the dataset was integrated into RelBench in July 2025, which is why SALT now appears both as a standalone reference notebook and as part of the broader RelBench ecosystem.