Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Graph ML vs Tabular ML: When to Use Which

The internet will tell you that graph ML is always better, or that XGBoost is all you need. Both are wrong. The right answer depends on a single question: how much of your predictive signal lives in the relationships between tables?

TL;DR

  • 1Tabular ML (XGBoost/LightGBM) excels on pre-engineered flat data with sub-millisecond inference and mature interpretability. It is the right choice for single-table tasks.
  • 2Graph ML captures multi-hop patterns across 3-4 tables automatically, scoring 75.83 AUROC on RelBench vs 62.44 for flat models. The advantage grows with data complexity.
  • 3The decision depends on one question: how much predictive signal lives in the relationships between your tables? For enterprise data with 5-50 tables, the answer is usually substantial.
  • 4KumoRFM delivers graph ML accuracy (76.71 zero-shot, 81.14 fine-tuned) without requiring graph ML infrastructure. One line of PQL, predictions in seconds.
  • 5For organizations running multiple prediction tasks on relational data, graph-based approaches amortize cost across tasks. Flat models require re-engineering features for each new question.

Tabular ML (XGBoost, LightGBM, CatBoost) dominates production ML. It is fast, interpretable, and has won the vast majority of Kaggle competitions on structured data since 2016. When people say "ML on tabular data is a solved problem," they mean gradient boosted trees.

Graph ML (graph neural networks) is newer, less well-known, and often associated with social networks and molecular chemistry. The idea that it belongs in enterprise ML on customer data, financial data, and operational data is still controversial in many data science teams.

This article is an honest comparison. Both approaches have strengths. Both have limitations. The decision is not about which is universally better. It is about which is better for your data and your prediction task.

When tabular ML wins

Tabular ML has four genuine advantages.

1. On pre-engineered flat data

If a skilled data scientist has already spent 12 hours engineering features from your relational database and produced a well-curated flat table, XGBoost will perform extremely well on it. The features encode domain knowledge. The gradient boosted trees learn non-linear interactions between those features. This combination is powerful.

On Kaggle competitions, where the data arrives pre-flattened and feature engineering is part of the competition, XGBoost and LightGBM are dominant. In 2023, gradient boosted trees won or placed in the top 3 of 85% of tabular data competitions.

2. Speed

XGBoost trains in minutes on millions of rows. LightGBM is even faster, with training times measured in seconds for many production datasets. Inference is sub-millisecond per sample. GNNs are slower: training involves multiple rounds of message passing over the graph, and inference requires neighborhood sampling. For latency-sensitive applications where predictions must be served in under 10 milliseconds, tabular models have an inherent advantage.

3. Interpretability

Gradient boosted trees produce feature importance rankings. SHAP values explain individual predictions. Decision paths can be traced. For regulated industries (credit decisioning, insurance pricing) where model explainability is a legal requirement, tabular models have mature interpretability tools. GNN interpretability is an active research area but less developed.

4. Simplicity

Training XGBoost requires a flat table and 10 lines of Python. No graph construction. No message passing architecture. No GPU memory management for large graphs. The tooling is mature (scikit-learn API), the failure modes are well-understood, and every data scientist knows how to use it.

When graph ML wins

Graph ML's advantages emerge in specific conditions, and those conditions are more common in enterprise data than most people realize.

1. On raw relational data (no feature engineering)

The key comparison is not "XGBoost on engineered features vs. GNN on a graph." It is "what can each approach achieve starting from the raw relational database?" XGBoost requires 12.3 hours of feature engineering before it can even start training. GNNs operate directly on the relational structure.

On the RelBench benchmark, starting from raw relational data: LightGBM with manual features achieves 62.44 AUROC. A supervised GNN achieves 75.83 AUROC. The GNN outperforms despite zero feature engineering effort because it captures patterns that humans do not think to encode.

2. When multi-hop patterns matter

If the predictive signal in your data spans multiple tables and multiple hops, graph ML has a structural advantage. Consider predicting whether a user will engage with a product recommendation:

users

user_idsegmentcity
U-301Power BuyerSeattle
U-302CasualPortland
U-303Power BuyerSeattle

interactions

user_idproduct_idactiondate
U-301P-50Purchased2025-02-10
U-301P-51Purchased2025-02-15
U-303P-50Purchased2025-02-12
U-303P-52Purchased2025-02-20
U-302P-50Browsed2025-03-01

U-302 browsed P-50. Should we recommend P-51 or P-52? The answer requires traversing: U-302 browsed P-50 (1-hop) -> U-301 and U-303 also bought P-50 (2-hop) -> U-301 then bought P-51, U-303 bought P-52 (3-hop) -> U-301 is in the same segment as U-302 (4-hop). P-52 is the better recommendation because it came from U-303, who shares the 'Power Buyer' segment.

flat_feature_table (what XGBoost sees for U-302)

user_idbrowse_countpurchase_countsegmentcandidate_product
U-30210CasualP-51
U-30210CasualP-52

Both candidate products look identical in the flat table. The multi-hop signal (which similar users bought which product) is gone.

A tabular model would need all of these to be explicitly engineered as features. A graph model traverses the connections automatically. On tasks where multi-hop patterns are strong (recommendation, fraud detection, entity resolution), the accuracy gap between flat and graph approaches often exceeds 15 AUROC points.

3. When the graph structure itself is the signal

In fraud detection, the structure of the transaction network is more predictive than any individual transaction attribute. Circular fund flows, star-shaped disbursement patterns, and chains of shell companies are graph topology patterns. No amount of feature engineering on a flat table can capture the full topology of a node's 3-hop neighborhood.

Similarly, in social networks, influence propagation is a graph phenomenon. In supply chains, cascading disruptions follow graph paths. In knowledge graphs, link prediction is inherently a graph task. For these problems, graph ML is not just better. It is the only approach that matches the problem structure.

4. When you have many prediction tasks on the same data

A flat-table model requires re-engineering features for every new prediction task. If you switch from predicting churn to predicting upsell, the feature SQL changes entirely. A graph model on the same relational database needs only a new target variable. The same graph structure, the same learned representations, serve multiple prediction tasks.

For organizations that run 10, 20, or 50 prediction tasks on the same relational data, the amortized cost of graph ML is far lower than the cost of engineering features 50 times.

Tabular ML strengths

  • Excels on pre-engineered flat data with expert features
  • Sub-millisecond inference latency
  • Mature interpretability tools (SHAP, feature importance)
  • Simple to implement: 10 lines of Python
  • Every data scientist knows XGBoost/LightGBM

Graph ML strengths

  • Excels on raw relational data without feature engineering
  • Captures multi-hop patterns across 3-4 tables automatically
  • Graph topology as a first-class predictive signal
  • Same model serves multiple prediction tasks on same data
  • 13-19 point AUROC advantage on RelBench benchmark

RelBench benchmark — classification AUROC by method

MethodAUROCFeature EngineeringMulti-hop PatternsTraining Time
LightGBM + manual features62.4412.3 hrs / 878 lines1-2 hops maxMinutes
LLM on serialized tables68.06NoneNone (text-based)Hours
Supervised GNN75.83None (automatic)3-4 hops30-60 min
KumoRFM (zero-shot)76.71None (pre-trained)3-4+ hops~1 second
KumoRFM (fine-tuned)81.14None (pre-trained)3-4+ hopsMinutes

7 databases, 30 tasks, 103M+ rows. The foundation model outperforms all approaches including task-specific GNNs, with zero feature engineering effort.

RelBench — AUROC gap by task type

Task CategoryFlat (LightGBM)Graph (GNN)KumoRFMGap (Flat vs Best)
Single-table dominant71.274.876.3+5.1
Multi-hop patterns (2-3 tables)60.176.481.7+21.6
Temporal + graph topology55.875.382.1+26.3
Cold-start / sparse labels52.372.178.9+26.6

The more relational the data, the larger the advantage. Tasks with strong multi-hop and temporal patterns show 20+ point AUROC gaps.

The benchmark evidence

The most comprehensive comparison of tabular and graph ML on relational data is the RelBench benchmark: 7 databases, 30 tasks, 103 million rows. The benchmark includes e-commerce, social networks, clinical trials, and academic datasets, ranging from 3 tables to 15 tables.

MethodAUROC (classification)Feature engineering
LightGBM + manual features62.4412.3 hours / 878 lines per task
LLM on serialized tables (Llama 3.2 3B)68.06None (but accuracy is poor)
Supervised GNN75.83None (automatic)
KumoRFM (zero-shot)76.71None (pre-trained)
KumoRFM (fine-tuned)81.14None (pre-trained + tuned)

The pattern is consistent across databases and task types. The more relational the data (more tables, deeper join paths, more temporal dynamics), the larger the graph ML advantage. On tasks where the signal lives primarily in a single table, the gap narrows to 3-5 points. On tasks where multi-hop patterns dominate, the gap exceeds 15 points.

The practical decision framework

Use this framework to decide which approach fits your situation.

Choose tabular ML when: your data is already flat or lives in 1-2 tables; you have a strong data science team with time for feature engineering; interpretability is a hard requirement; latency requirements are under 10ms; and you have a single, well-defined prediction task.

Choose graph ML when: your data spans 5 or more connected tables; multi-hop relationships are likely predictive (recommendations, fraud, risk); temporal sequences matter and aggregation loses signal; you have multiple prediction tasks on the same data; and your team is spending months on feature engineering with diminishing returns.

PQL Query

PREDICT next_purchase_category
FOR EACH customers.customer_id

The same model handles classification, regression, and recommendation tasks. Each new prediction question is a query, not a project.

Output

customer_idpredicted_categoryconfidencereasoning_depth
C-5001Electronics0.873-hop: similar customer clusters
C-5002Home & Garden0.722-hop: brand affinity transfer
C-5003Sports0.914-hop: seasonal + cohort pattern
C-5004Books0.652-hop: category co-occurrence

Choose a foundation model (KumoRFM) when: you want graph ML accuracy without building graph ML infrastructure; you need predictions fast (seconds, not months); you have many prediction tasks across the same relational data; and you want to eliminate feature engineering and pipeline maintenance entirely.

Where this is heading

The trajectory of the field is toward models that match the data structure. Text models operate on sequences. Image models operate on pixel grids. Enterprise data is relational, and the models are catching up.

Tabular ML will remain important for single-table tasks and latency-sensitive applications. But the assumption that all ML starts with a flat CSV is fading. The data has always been relational. The models are finally learning to work with it rather than against it.

The question is not whether graph ML or tabular ML is better in the abstract. The question is how much signal lives in the relationships between your tables. If the answer is "a lot," and for most enterprise data the answer is "a lot," then the approach that preserves those relationships will outperform the one that destroys them. The benchmark evidence is unambiguous on this point.

Frequently asked questions

Is graph ML always better than tabular ML?

No. On pre-engineered flat tables where a skilled data scientist has already captured the important cross-table patterns, XGBoost and LightGBM are highly competitive and often preferred for their speed, interpretability, and simplicity. Graph ML's advantage emerges when the data is naturally relational (multiple connected tables) and the cross-table, multi-hop patterns have not been manually engineered into features. The more tables and relationships in your data, the larger graph ML's advantage.

What is tabular ML?

Tabular ML refers to models that operate on flat tables: one row per sample, one column per feature. The dominant algorithms are gradient boosted decision trees (XGBoost, LightGBM, CatBoost), which have won the majority of tabular prediction competitions on Kaggle since 2016. These models are fast to train, produce interpretable feature importance rankings, and handle mixed data types well. Their limitation is that they require pre-engineered flat input.

What is graph ML?

Graph ML refers to models that operate on graph-structured data: nodes connected by edges. Graph neural networks (GNNs) learn by passing messages between connected nodes, allowing each node to incorporate information from its neighborhood. For relational databases, rows become nodes and foreign keys become edges. GNNs automatically discover cross-table patterns without manual feature engineering, including multi-hop relationships that span 3-4 tables.

How much better is graph ML on relational data?

On the RelBench benchmark (7 databases, 30 tasks, 103M+ rows), graph ML (supervised GNN) scored 75.83 AUROC compared to 62.44 for LightGBM with manual features. KumoRFM, a foundation model that uses graph ML internally, scored 76.71 zero-shot and 81.14 with fine-tuning. The gap varies by task: tasks with strong multi-hop patterns show 15-20 point gaps, while tasks where most signal is in the primary table show smaller gaps of 3-5 points.

Can I use both approaches together?

Yes. A common production pattern is to use graph ML to generate embeddings (learned representations) for each entity, then add those embeddings as features to a tabular model alongside manually engineered features. This combines the multi-hop pattern discovery of graph ML with the feature importance and speed of XGBoost. However, foundation models like KumoRFM typically make this hybrid approach unnecessary by capturing both types of patterns in a single model.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.