Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

KumoRFM vs TabPFN: Relational Foundation Model vs Single-Table Foundation Model

KumoRFM 2.0 scores 91% vs 75% on SAP SALT and 76.71 vs 62.44 AUROC on RelBench. TabPFN operates on a single flat table. KumoRFM 2.0 operates on both single tables and multiple relational tables - a strict superset. Enterprise data lives in 5-50 connected tables. When you flatten it for TabPFN, you hit a hard accuracy ceiling that no algorithm can overcome.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2TabPFN (PriorLabs) is a Nature-published foundation model for single flat tables. It uses in-context learning to make predictions without training, matching tuned XGBoost in approximately 2.8 seconds. It is open-source, fast, and effective on single-table problems.
  • 3KumoRFM is a foundation model for multiple relational tables. It reads 5-50 connected tables directly, using a graph transformer to discover multi-hop predictive patterns across table boundaries without any feature engineering or flattening.
  • 4On RelBench tasks involving 5+ tables, the accuracy gap between relational and flattened approaches widens to 15-20+ AUROC points. This is the 'flattening tax' - the cost of forcing relational data into a single-table format.
  • 5KumoRFM 2.0 also supports single-table prediction, making it a superset: it does everything TabPFN does on single tables, plus handles the multi-table relational problems that TabPFN cannot address.

TabPFN is one of the most impressive recent developments in tabular machine learning. Built by PriorLabs (EUR 9M pre-seed led by Balderton Capital), published in Nature, and open-sourced on Hugging Face, it represents a genuine breakthrough: a foundation model that can make accurate predictions on a new dataset in seconds, without any training, by using in-context learning. On single-table benchmarks, it matches or beats carefully tuned XGBoost models in approximately 2.8 seconds.

KumoRFM is also a foundation model for tabular data. But it solves a different structural problem. Where TabPFN reads one flat table, KumoRFM reads multiple relational tables connected by foreign keys and discovers predictive patterns across the full relational graph. This is not a marginal difference in architecture - it is a fundamental difference in what data the model can see.

The question is not which model is better. The question is: does your data fit in one table? If it does, both models are strong options. If it does not - and enterprise data almost never does - then forcing it into a single table for TabPFN means paying a steep accuracy tax.

The headline result: SAP SALT benchmark

Before diving into detailed comparisons, here is the result that matters most. The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes (customer behavior, demand patterns, operational metrics) on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points. Zero feature engineering. Zero training. The model reads raw enterprise tables and predicts.

This is not a marginal improvement. KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

kumo_vs_tabpfn_comparison

dimensionTabPFN (PriorLabs)Kumo (KumoRFM)
Data inputSingle flat tableMultiple relational tables connected by foreign keys
ArchitectureTransformer with in-context learningGraph transformer over relational structure
Training dataSynthetic single-table datasets10,000s of diverse relational datasets
Multi-table supportNone - requires pre-flattened single tableNative - reads 5-50 connected tables directly
Multi-hop pattern discoveryNot possible - single table onlyNative - captures 2-hop, 3-hop, 4+ hop signals across tables
Training requiredNone (in-context learning)None for zero-shot; optional fine-tuning for maximum accuracy
Inference speed~2.8 seconds~1 second (zero-shot)
Scale (rows)~50K (open-source), 10M (enterprise)Hundreds of millions of rows across dozens of tables
Open-sourceYes (Hugging Face)No (enterprise SaaS)
Data warehouse integrationNone - export data to useNative Snowflake/Databricks - no data movement
Single-table performanceStrong - matches tuned XGBoostStrong - competitive on single-table tasks (KumoRFM 2.0)
Relational data performanceLimited by flattening - loses multi-hop signalsState-of-the-art - 76.71 AUROC zero-shot on RelBench

Both are foundation models for tabular data. The structural difference is what they can read: one table vs. many tables. For enterprise data that spans 5-50 tables, this distinction determines what signals the model can discover.

What TabPFN does well

TabPFN is a genuine advance in tabular ML, and a fair comparison requires acknowledging its real strengths.

  • No training required. TabPFN uses in-context learning: you pass your data as context, and the model makes predictions immediately. No hyperparameter tuning, no cross-validation, no training loop. This is a real simplification of the ML workflow for single-table problems.
  • Fast inference. Approximately 2.8 seconds to produce predictions on a new dataset. This makes it practical for rapid prototyping, exploratory analysis, and situations where you need a quick baseline before investing in a full pipeline.
  • Published in Nature. TabPFN's approach is rigorously validated. The Nature publication demonstrates that a pre-trained transformer can match or beat tuned tree-based models on a wide range of single-table benchmarks. This is a credible, peer-reviewed result.
  • Open-source. Available on Hugging Face, TabPFN can be used freely for experimentation and production on single-table tasks. The open-source model supports datasets up to approximately 50,000 samples (version 2.5), with PriorLabs' enterprise offering scaling to 10 million rows.
  • No feature engineering on single tables. For problems where all predictive signals exist in a single table, TabPFN eliminates the need for manual feature engineering. You provide the raw table, and the model handles the rest.

The flattening ceiling: what you lose when you force relational data into a single table

Enterprise data does not live in a single table. A typical prediction task - churn prediction, fraud detection, lead scoring, demand forecasting - requires data from 5 to 50 connected tables. To use TabPFN on this data, you must flatten it: join the tables, compute aggregations, and collapse everything into one row per entity. This is not just tedious. It permanently destroys information that no model can recover.

Think about what this actually means. Flattening a relational database into one table is like reducing a large company's org chart into a single flat list of employee names. You keep the names. But you lose who reports to whom, which departments exist, who has dotted-line relationships, and how deep the organization goes. A small startup with 10 people? The flat list is fine. A Fortune 500 with 50,000 employees across 200 departments? The flat list is useless for any question that depends on organizational structure. Enterprise databases are the same: with billions of rows across dozens of connected tables, flattening into a single table throws away exactly the relationships that predict business outcomes.

Consider a concrete example. You want to predict customer churn. The predictive signal you need follows a 4-hop path through your relational database:

  1. Customer → Orders. Which products has this customer bought, when, and how frequently?
  2. Orders → Products. What categories and price points characterize their purchase history?
  3. Products → Reviews. How are other customers rating the same products? Are satisfaction scores declining for the products this customer relies on?
  4. Reviews → Other customers who bought the same products → Their churn patterns. Did customers with similar product portfolios and review sentiment churn recently? At what rate?

This 4-hop signal is one of the strongest churn predictors in relational data. It captures a structural pattern: when customers with similar purchasing behavior start churning, it is a leading indicator for the remaining customers in that cohort. KumoRFM's graph transformer discovers this pattern automatically by traversing the relational graph. TabPFN never sees it because the signal does not exist in any single flat table - it exists in the connections between tables.

what_flattening_destroys (churn prediction example)

signal_typeavailable in flat table (TabPFN)available in relational graph (Kumo)
Customer purchase countYes - orders_count = 23Yes - plus full temporal sequence and recency patterns
Average order valueYes - avg_order_value = $142Yes - plus trend (declining from $180 to $95 over 6 months)
Product category distributionPartially - top_category = 'electronics'Yes - full distribution across 8 categories with temporal shifts
Product review sentiment (for purchased items)No - requires Product-to-Review joinYes - reviews for purchased products averaging 2.1 stars (declining)
Similar-customer churn signalNo - requires 4-hop traversalYes - 67% of customers with similar product portfolio churned in last 90 days
Cross-table temporal patternsNo - flattening collapses timeYes - support ticket spike followed by order frequency drop detected
Graph-structural positionNo - flat table has no graph structureYes - customer is in a weakly-connected component with high churn density

The first two rows are the only signals TabPFN can access. The remaining five rows represent the multi-hop, temporal, and structural patterns that only exist in the relational graph. On relational datasets, these hidden signals account for the 15-20+ AUROC point gap.

Both are foundation models - but pre-trained on different structures

TabPFN and KumoRFM are both pre-trained foundation models that generalize to new datasets without task-specific training. The critical difference is what they were pre-trained on.

  • TabPFN was pre-trained on synthetic single-table datasets. It learned the statistical patterns common to flat tabular data: feature correlations, nonlinear decision boundaries, class distributions, missing value patterns. This makes it excellent at single-table prediction - it has seen millions of synthetic tables and learned general patterns that transfer to real single-table data.
  • KumoRFM was pre-trained on tens of thousands of diverse relational datasets. It learned patterns that exist specifically in relational structures: how entities relate across tables, how multi-hop connections carry predictive signal, how temporal patterns propagate across table boundaries, and how graph-structural properties predict entity behavior. These patterns do not exist in single-table data.

This pre-training difference has a direct consequence. TabPFN generalizes well to new single tables. KumoRFM generalizes well to new relational databases. For enterprise data, which is inherently relational, KumoRFM's pre-training is more aligned with the actual structure of the data.

TabPFN workflow (relational data)

  • Export data from your database into a flat format
  • Write SQL joins to combine 5-50 tables (2-8 hours)
  • Compute aggregations, handle temporal features manually
  • Lose multi-hop relationships, temporal sequences, and graph structure
  • Feed the flattened table to TabPFN (~2.8 seconds inference)
  • Get predictions limited by the signals that survived flattening

Kumo workflow

  • Connect Kumo to your data warehouse (one-time setup)
  • Write a PQL query defining what you want to predict
  • KumoRFM reads all relational tables, discovers multi-hop patterns
  • Zero flattening, zero feature engineering, zero information loss
  • Time to first prediction: ~1 second (zero-shot)
  • Get predictions powered by the full relational structure

Benchmark results: single-table vs relational

On single-table benchmarks, TabPFN performs well - matching or beating tuned XGBoost. This is its designed operating range. The divergence appears when the data is relational.

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing. An AUROC of 100 means perfect prediction. In practice, moving from 65 to 77 AUROC is a significant improvement - it means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%. For fraud detection, that difference can mean catching 40% more fraud with the same false positive rate. For churn prediction, it means identifying at-risk customers weeks earlier.

performance_by_data_structure

scenarioTabPFNKumoRFMgap
Single-table classification (standard benchmarks)Strong - matches tuned XGBoostStrong - competitive (KumoRFM 2.0)Comparable
Relational data, 2-3 tablesModerate - loses some cross-table signal76+ AUROC zero-shot5-10 AUROC points
Relational data, 5+ tablesWeak - severe flattening tax76.71 AUROC zero-shot (RelBench avg)15-20+ AUROC points
Relational data, 5+ tables (fine-tuned)Cannot improve - limited by flat input81.14 AUROC (RelBench avg)20-25+ AUROC points

Highlighted: the accuracy gap scales with relational complexity. On single tables, both models are strong. As the number of tables increases, the flattening tax compounds - each additional table represents more multi-hop signals that TabPFN cannot access.

The widening gap is not about model sophistication. TabPFN's transformer architecture is powerful. But it operates on a flat table - and a flat table derived from 5+ joined tables has lost the very patterns that differentiate accurate predictions from mediocre ones. No model, no matter how advanced, can recover signals that were destroyed in the flattening step.

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'

One PQL query replaces the entire flattening pipeline. KumoRFM reads the raw customers, orders, products, reviews, and support_tickets tables directly. The 4-hop signal (Customer to Orders to Products to Reviews to Similar customers' churn) is discovered automatically - no joins, no aggregations, no information loss.

Output

customer_idchurn_prob_kumochurn_prob_flat_tabledelta
C-72010.910.64+27 points (Kumo detects similar-customer churn wave)
C-72020.150.42Kumo correctly lower (strong cross-table engagement signals)
C-72030.880.55+33 points (Kumo sees product review decline + support escalation)
C-72040.070.09Both correctly low (healthy account, no relational risk signals)

Scale: open-source research vs enterprise production

TabPFN's open-source version (v2.5) scales to approximately 50,000 samples - suitable for research, prototyping, and smaller datasets. PriorLabs' enterprise offering extends this to 10 million rows, which covers many production use cases on single tables.

KumoRFM is designed for enterprise relational data at scale: hundreds of millions of rows across dozens of connected tables, with billions of relationship edges in the relational graph. It runs natively inside Snowflake and Databricks - no data export, no data movement, no external processing. For organizations with large relational databases, this is a material infrastructure difference.

scale_comparison

dimensionTabPFN open-sourceTabPFN enterpriseKumo (KumoRFM)
Max rows~50,000~10 millionHundreds of millions
Max tables1150+
Data warehouse integrationNoneLimitedNative Snowflake/Databricks
Data movement requiredYes - export to PythonYes - connect via APINo - runs inside your warehouse
Enterprise securitySelf-hostedPriorLabs managedData never leaves your warehouse

For single-table research and prototyping, TabPFN's open-source model is a strong choice. For enterprise relational data at production scale, Kumo's warehouse-native architecture avoids the data movement and scale limitations.

When to choose TabPFN

TabPFN is an excellent tool in specific scenarios. Choose TabPFN when:

  • Your data fits in a single table. If all predictive signals exist in one table with no multi-table joins required, TabPFN delivers strong accuracy with zero training time. This is its core strength and designed operating range.
  • You need a fast baseline. TabPFN's 2.8-second inference makes it ideal for rapid prototyping, exploratory analysis, and quick comparisons before investing in a full production pipeline.
  • You want open-source. TabPFN is freely available on Hugging Face. For teams that prefer open-source tools, want to inspect the model, or need to self-host, TabPFN provides that flexibility.
  • Your dataset is small to medium. For datasets under 50,000 rows (open-source) or 10 million rows (enterprise), TabPFN handles the scale comfortably on single-table tasks.
  • You are in a research or academic setting. TabPFN's Nature publication, open-source availability, and strong single-table benchmarks make it a natural choice for research comparisons and academic work.

When to choose Kumo

Kumo solves a different structural problem. Choose Kumo when:

  • Your data lives in multiple relational tables. Customers, orders, products, reviews, interactions, support tickets - if your predictive signals span table boundaries, KumoRFM discovers them automatically. TabPFN requires you to flatten them first, losing multi-hop patterns in the process.
  • Multi-hop patterns matter for your prediction. If churn depends on what similar customers experienced, if fraud depends on transaction network structure, if recommendations depend on purchase-graph similarity - these are patterns that only exist in relational structure and cannot survive flattening.
  • You need enterprise scale. Hundreds of millions of rows across dozens of tables, running natively in your data warehouse without data export. KumoRFM's architecture is designed for this operating range.
  • You want to avoid the flattening pipeline. The SQL joins, aggregation computation, and temporal feature engineering required to flatten relational data for TabPFN take hours of data science time per task and create brittle pipelines. Kumo eliminates this entirely.
  • You need maximum accuracy on relational data. The 15-20+ AUROC point gap on relational benchmarks translates directly to business outcomes. In fraud detection, this means millions more in caught fraud. In churn prediction, it means significantly more customers retained. In lead scoring, it means higher conversion rates.

They are not competitors - they solve different data structures

The most accurate way to frame this comparison is not TabPFN vs. KumoRFM, but single-table vs. relational. TabPFN is the best foundation model for single flat tables. KumoRFM is the best foundation model for relational tables. The question is which description matches your data.

For most enterprise prediction tasks, the answer is relational. Customer churn, fraud detection, recommendation, lead scoring, demand forecasting, supply chain optimization - these tasks inherently involve multiple connected entities across multiple tables. Forcing this data into a single table is possible, but the flattening tax is real: 15-20+ AUROC points on complex relational tasks, plus the engineering cost of building and maintaining the flattening pipeline.

KumoRFM 2.0 makes this choice simpler by supporting both single-table and multi-table tasks. On single-table problems, it is competitive with TabPFN. On multi-table problems, it captures patterns that no single-table model can access. It is a superset, not a replacement.

Frequently asked questions

How does KumoRFM compare to TabPFN?

KumoRFM 2.0 is a superset of TabPFN. On single flat tables, both models are competitive - TabPFN matches tuned XGBoost with zero tuning, and KumoRFM delivers comparable zero-shot accuracy. The difference appears on multi-table relational data, where KumoRFM dramatically outperforms TabPFN. On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost (TabPFN cannot participate because it requires a single flat table). On RelBench (7 databases, 30 tasks, 103M rows), KumoRFM zero-shot scores 76.71 AUROC vs 62.44 for LightGBM on flattened data. KumoRFM is pre-trained on tens of thousands of real relational datasets. TabPFN is pre-trained on synthetic single-table data. For enterprise data spanning 5-50 tables, KumoRFM captures multi-hop patterns that TabPFN structurally cannot see.

What is the main difference between KumoRFM and TabPFN?

TabPFN is a foundation model for single flat tables. It uses in-context learning to make predictions on one table at a time without training. KumoRFM is a foundation model for multiple relational tables. It reads 5-50 connected tables directly and discovers multi-hop predictive patterns across table boundaries using a graph transformer pre-trained on tens of thousands of real relational datasets. Both are foundation models, but they operate on fundamentally different data structures. TabPFN assumes your data fits in one table. KumoRFM 2.0 handles both single-table and multi-table data - making it a strict superset.

Can TabPFN handle multi-table relational data?

No. TabPFN requires a single flat table as input. If your data spans multiple relational tables (customers, orders, products, reviews, support tickets), you must manually join and flatten everything into one table before TabPFN can use it. This flattening permanently destroys 3rd-degree and 4th-degree connections, temporal sequences across tables, and graph-structural patterns - creating a hard accuracy ceiling, not just a penalty. On RelBench tasks with 5+ tables, this flattening tax costs 15-20+ AUROC points compared to KumoRFM, which reads the relational structure natively.

Is TabPFN accurate on single-table problems?

Yes. TabPFN is highly effective on single-table classification and regression tasks. It was published in Nature and demonstrated performance that matches or beats tuned XGBoost on many benchmarks, with inference in approximately 2.8 seconds. For problems where all predictive signals exist in a single table, TabPFN is a strong and well-validated choice. However, enterprise prediction tasks rarely fit in a single table.

Does KumoRFM also work on single-table problems?

Yes. KumoRFM 2.0 supports both single-table and multi-table prediction tasks. On single-table problems, KumoRFM is competitive with TabPFN and XGBoost. On multi-table relational problems, KumoRFM significantly outperforms any single-table model - scoring 91% vs 75% on SAP SALT and 76.71 vs 62.44 AUROC on RelBench - because it captures cross-table patterns that flat-table models structurally cannot access. KumoRFM 2.0 is a superset: it does everything TabPFN does on single tables, plus handles the relational problems that represent most enterprise use cases.

What is the flattening tax?

The flattening tax is the permanent accuracy ceiling created when you join and aggregate multi-table relational data into a single flat table for use with single-table models like TabPFN. Flattening destroys multi-hop relationships (e.g., Customer to Orders to Products to Reviews to Other customers' churn patterns), collapses temporal sequences into static aggregates, and eliminates graph-structural signals. This is not a penalty a better algorithm can overcome - it is information loss. On RelBench tasks involving 5+ tables, the flattening tax costs 15-20+ AUROC points compared to KumoRFM, which reads the relational structure natively.

How does TabPFN scale compared to KumoRFM?

TabPFN 2.5 scales to approximately 50,000 samples in its open-source version, and PriorLabs' enterprise offering supports up to 10 million rows on a single table. KumoRFM is designed for enterprise-scale relational data: hundreds of millions of rows across dozens of tables, running natively inside Snowflake or Databricks without data movement. For enterprise datasets with billions of relationship edges, KumoRFM's graph transformer architecture is purpose-built for scale.

What benchmarks compare KumoRFM and TabPFN?

The SAP SALT enterprise benchmark shows KumoRFM at 91% accuracy vs 75% for PhD data scientists with hand-tuned XGBoost vs 63% for LLM+AutoML. TabPFN cannot participate in SAP SALT because it requires multi-table enterprise data. On the RelBench benchmark (7 databases, 30 tasks, 103M rows), KumoRFM zero-shot scores 76.71 AUROC vs 62.44 for LightGBM with expert-engineered features on flattened data. Fine-tuned KumoRFM reaches 81.14 AUROC. On single-table tasks, both models are competitive with tuned XGBoost. The divergence appears exclusively on relational data, where KumoRFM's graph transformer and pre-training on tens of thousands of real relational datasets gives it access to signals that flat-table models cannot see.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.