Tabular ML (XGBoost, LightGBM, CatBoost) dominates production ML. It is fast, interpretable, and has won the vast majority of Kaggle competitions on structured data since 2016. When people say "ML on tabular data is a solved problem," they mean gradient boosted trees.
Graph ML (graph neural networks) is newer, less well-known, and often associated with social networks and molecular chemistry. The idea that it belongs in enterprise ML on customer data, financial data, and operational data is still controversial in many data science teams.
This article is an honest comparison. Both approaches have strengths. Both have limitations. The decision is not about which is universally better. It is about which is better for your data and your prediction task.
When tabular ML wins
Tabular ML has four genuine advantages.
1. On pre-engineered flat data
If a skilled data scientist has already spent 12 hours engineering features from your relational database and produced a well-curated flat table, XGBoost will perform extremely well on it. The features encode domain knowledge. The gradient boosted trees learn non-linear interactions between those features. This combination is powerful.
On Kaggle competitions, where the data arrives pre-flattened and feature engineering is part of the competition, XGBoost and LightGBM are dominant. In 2023, gradient boosted trees won or placed in the top 3 of 85% of tabular data competitions.
2. Speed
XGBoost trains in minutes on millions of rows. LightGBM is even faster, with training times measured in seconds for many production datasets. Inference is sub-millisecond per sample. GNNs are slower: training involves multiple rounds of message passing over the graph, and inference requires neighborhood sampling. For latency-sensitive applications where predictions must be served in under 10 milliseconds, tabular models have an inherent advantage.
3. Interpretability
Gradient boosted trees produce feature importance rankings. SHAP values explain individual predictions. Decision paths can be traced. For regulated industries (credit decisioning, insurance pricing) where model explainability is a legal requirement, tabular models have mature interpretability tools. GNN interpretability is an active research area but less developed.
4. Simplicity
Training XGBoost requires a flat table and 10 lines of Python. No graph construction. No message passing architecture. No GPU memory management for large graphs. The tooling is mature (scikit-learn API), the failure modes are well-understood, and every data scientist knows how to use it.
When graph ML wins
Graph ML's advantages emerge in specific conditions, and those conditions are more common in enterprise data than most people realize.
1. On raw relational data (no feature engineering)
The key comparison is not "XGBoost on engineered features vs. GNN on a graph." It is "what can each approach achieve starting from the raw relational database?" XGBoost requires 12.3 hours of feature engineering before it can even start training. GNNs operate directly on the relational structure.
On the RelBench benchmark, starting from raw relational data: LightGBM with manual features achieves 62.44 AUROC. A supervised GNN achieves 75.83 AUROC. The GNN outperforms despite zero feature engineering effort because it captures patterns that humans do not think to encode.
2. When multi-hop patterns matter
If the predictive signal in your data spans multiple tables and multiple hops, graph ML has a structural advantage. Consider predicting whether a user will engage with a product recommendation:
users
| user_id | segment | city |
|---|---|---|
| U-301 | Power Buyer | Seattle |
| U-302 | Casual | Portland |
| U-303 | Power Buyer | Seattle |
interactions
| user_id | product_id | action | date |
|---|---|---|---|
| U-301 | P-50 | Purchased | 2025-02-10 |
| U-301 | P-51 | Purchased | 2025-02-15 |
| U-303 | P-50 | Purchased | 2025-02-12 |
| U-303 | P-52 | Purchased | 2025-02-20 |
| U-302 | P-50 | Browsed | 2025-03-01 |
U-302 browsed P-50. Should we recommend P-51 or P-52? The answer requires traversing: U-302 browsed P-50 (1-hop) -> U-301 and U-303 also bought P-50 (2-hop) -> U-301 then bought P-51, U-303 bought P-52 (3-hop) -> U-301 is in the same segment as U-302 (4-hop). P-52 is the better recommendation because it came from U-303, who shares the 'Power Buyer' segment.
flat_feature_table (what XGBoost sees for U-302)
| user_id | browse_count | purchase_count | segment | candidate_product |
|---|---|---|---|---|
| U-302 | 1 | 0 | Casual | P-51 |
| U-302 | 1 | 0 | Casual | P-52 |
Both candidate products look identical in the flat table. The multi-hop signal (which similar users bought which product) is gone.
A tabular model would need all of these to be explicitly engineered as features. A graph model traverses the connections automatically. On tasks where multi-hop patterns are strong (recommendation, fraud detection, entity resolution), the accuracy gap between flat and graph approaches often exceeds 15 AUROC points.
3. When the graph structure itself is the signal
In fraud detection, the structure of the transaction network is more predictive than any individual transaction attribute. Circular fund flows, star-shaped disbursement patterns, and chains of shell companies are graph topology patterns. No amount of feature engineering on a flat table can capture the full topology of a node's 3-hop neighborhood.
Similarly, in social networks, influence propagation is a graph phenomenon. In supply chains, cascading disruptions follow graph paths. In knowledge graphs, link prediction is inherently a graph task. For these problems, graph ML is not just better. It is the only approach that matches the problem structure.
4. When you have many prediction tasks on the same data
A flat-table model requires re-engineering features for every new prediction task. If you switch from predicting churn to predicting upsell, the feature SQL changes entirely. A graph model on the same relational database needs only a new target variable. The same graph structure, the same learned representations, serve multiple prediction tasks.
For organizations that run 10, 20, or 50 prediction tasks on the same relational data, the amortized cost of graph ML is far lower than the cost of engineering features 50 times.
Tabular ML strengths
- Excels on pre-engineered flat data with expert features
- Sub-millisecond inference latency
- Mature interpretability tools (SHAP, feature importance)
- Simple to implement: 10 lines of Python
- Every data scientist knows XGBoost/LightGBM
Graph ML strengths
- Excels on raw relational data without feature engineering
- Captures multi-hop patterns across 3-4 tables automatically
- Graph topology as a first-class predictive signal
- Same model serves multiple prediction tasks on same data
- 13-19 point AUROC advantage on RelBench benchmark
RelBench benchmark — classification AUROC by method
| Method | AUROC | Feature Engineering | Multi-hop Patterns | Training Time |
|---|---|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hrs / 878 lines | 1-2 hops max | Minutes |
| LLM on serialized tables | 68.06 | None | None (text-based) | Hours |
| Supervised GNN | 75.83 | None (automatic) | 3-4 hops | 30-60 min |
| KumoRFM (zero-shot) | 76.71 | None (pre-trained) | 3-4+ hops | ~1 second |
| KumoRFM (fine-tuned) | 81.14 | None (pre-trained) | 3-4+ hops | Minutes |
7 databases, 30 tasks, 103M+ rows. The foundation model outperforms all approaches including task-specific GNNs, with zero feature engineering effort.
RelBench — AUROC gap by task type
| Task Category | Flat (LightGBM) | Graph (GNN) | KumoRFM | Gap (Flat vs Best) |
|---|---|---|---|---|
| Single-table dominant | 71.2 | 74.8 | 76.3 | +5.1 |
| Multi-hop patterns (2-3 tables) | 60.1 | 76.4 | 81.7 | +21.6 |
| Temporal + graph topology | 55.8 | 75.3 | 82.1 | +26.3 |
| Cold-start / sparse labels | 52.3 | 72.1 | 78.9 | +26.6 |
The more relational the data, the larger the advantage. Tasks with strong multi-hop and temporal patterns show 20+ point AUROC gaps.
The benchmark evidence
The most comprehensive comparison of tabular and graph ML on relational data is the RelBench benchmark: 7 databases, 30 tasks, 103 million rows. The benchmark includes e-commerce, social networks, clinical trials, and academic datasets, ranging from 3 tables to 15 tables.
| Method | AUROC (classification) | Feature engineering |
|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours / 878 lines per task |
| LLM on serialized tables (Llama 3.2 3B) | 68.06 | None (but accuracy is poor) |
| Supervised GNN | 75.83 | None (automatic) |
| KumoRFM (zero-shot) | 76.71 | None (pre-trained) |
| KumoRFM (fine-tuned) | 81.14 | None (pre-trained + tuned) |
The pattern is consistent across databases and task types. The more relational the data (more tables, deeper join paths, more temporal dynamics), the larger the graph ML advantage. On tasks where the signal lives primarily in a single table, the gap narrows to 3-5 points. On tasks where multi-hop patterns dominate, the gap exceeds 15 points.
The practical decision framework
Use this framework to decide which approach fits your situation.
Choose tabular ML when: your data is already flat or lives in 1-2 tables; you have a strong data science team with time for feature engineering; interpretability is a hard requirement; latency requirements are under 10ms; and you have a single, well-defined prediction task.
Choose graph ML when: your data spans 5 or more connected tables; multi-hop relationships are likely predictive (recommendations, fraud, risk); temporal sequences matter and aggregation loses signal; you have multiple prediction tasks on the same data; and your team is spending months on feature engineering with diminishing returns.
PQL Query
PREDICT next_purchase_category FOR EACH customers.customer_id
The same model handles classification, regression, and recommendation tasks. Each new prediction question is a query, not a project.
Output
| customer_id | predicted_category | confidence | reasoning_depth |
|---|---|---|---|
| C-5001 | Electronics | 0.87 | 3-hop: similar customer clusters |
| C-5002 | Home & Garden | 0.72 | 2-hop: brand affinity transfer |
| C-5003 | Sports | 0.91 | 4-hop: seasonal + cohort pattern |
| C-5004 | Books | 0.65 | 2-hop: category co-occurrence |
Choose a foundation model (KumoRFM) when: you want graph ML accuracy without building graph ML infrastructure; you need predictions fast (seconds, not months); you have many prediction tasks across the same relational data; and you want to eliminate feature engineering and pipeline maintenance entirely.
Where this is heading
The trajectory of the field is toward models that match the data structure. Text models operate on sequences. Image models operate on pixel grids. Enterprise data is relational, and the models are catching up.
Tabular ML will remain important for single-table tasks and latency-sensitive applications. But the assumption that all ML starts with a flat CSV is fading. The data has always been relational. The models are finally learning to work with it rather than against it.
The question is not whether graph ML or tabular ML is better in the abstract. The question is how much signal lives in the relationships between your tables. If the answer is "a lot," and for most enterprise data the answer is "a lot," then the approach that preserves those relationships will outperform the one that destroys them. The benchmark evidence is unambiguous on this point.