When is tabular data sufficient?

Tabular data works well when: (1) the prediction depends only on features of the target entity (predict house price from square footage, bedrooms, location), (2) there is no meaningful relational structure between entities, (3) the data fits naturally in a single table, and (4) gradient-boosted trees (XGBoost, LightGBM) already achieve strong performance. Adding graph structure in these cases provides marginal improvement.

When is graph data essential?

Graph data is essential when: (1) the prediction depends on relationships (fraud rings, social influence, supply chain cascades), (2) data spans multiple related tables (enterprise relational databases), (3) structural patterns carry signal (network topology, community membership), and (4) multi-hop dependencies matter (a customer's risk depends on their counterparties' counterparties). These signals are invisible in flat tables.

Can tabular models use graph features?

Yes, you can engineer graph features (degree, PageRank, clustering coefficient) and add them as columns to a flat table. This captures some graph signal but is limited: (1) you must know which graph features matter in advance, (2) fixed features cannot capture complex multi-hop patterns, and (3) the number of possible graph features is infinite. GNNs learn the relevant graph features automatically.

What about the XGBoost vs GNN debate?

On single-table data, XGBoost/LightGBM often matches or beats GNNs because tabular models are optimized for independent rows with heterogeneous features. On multi-table relational data, GNNs significantly outperform tabular models (13+ AUROC points on RelBench). The debate is about which data type you have, not which model is universally better.

Tabular Data vs Graph Data: When Flat Tables Work and When You Need Graphs | Kumo.ai

The distinction between tabular and graph data is about where the prediction signal lives. If the signal is in each entity's own features (predict house price from square footage and bedrooms), a flat table is the right representation. If the signal is in the relationships between entities (predict fraud from transaction network topology), you need a graph.

Tabular data: when rows are independent

A tabular dataset treats each row as an independent observation:

Each row has a fixed set of features (columns)
Rows do not depend on each other
The order of rows does not matter
Prediction uses only the features of the target row

Classic tabular problems: predicting house prices (features: square footage, bedrooms, location), classifying iris species (features: petal length, width), credit scoring from application data (features: income, employment, credit history).

Gradient-boosted trees (XGBoost, LightGBM, CatBoost) are the state of the art for tabular data. They handle heterogeneous feature types, missing values, and non-linear relationships efficiently. For genuinely single-table problems, they remain extremely competitive.

Graph data: when relationships carry signal

Graph data explicitly represents relationships between entities:

Entities are nodes with features
Relationships are edges connecting nodes
The prediction depends on the entity's connections, not just its own features
Structural patterns (clusters, paths, hubs) carry signal

Graph problems: fraud detection (transaction networks), recommendation (user-item interactions), drug discovery (molecular structure), social analysis (influence networks), supply chain (supplier-manufacturer relationships).

The hidden graph in enterprise data

Most enterprise data looks tabular but is actually relational. A “customer churn” table with 200 features was derived from 10+ source tables through JOINs and aggregations. The flat table is the result of flattening a graph.

The question is not “is my data tabular or graph?” but “am I losing information by flattening my naturally relational data into a flat table?” When the source data has foreign key relationships between tables, the answer is usually yes.

Quantitative comparison

On RelBench (enterprise relational databases):

Flat-table LightGBM (expert features): 62.44 avg AUROC
GNN on relational graph: 75.83 avg AUROC (+13.4 points)

On Kaggle-style single-table competitions:

Gradient-boosted trees win most competitions
GNNs provide minimal improvement over tabular models

The pattern is clear: GNNs excel when relational structure exists. On flat single-table data, tabular models are sufficient and often preferable (faster training, simpler deployment, well-understood).

The hybrid path

In practice, the best approach combines both:

Use graph representation for the relational source data (preserve multi-table structure)
Use tabular-style feature handling for individual node features (numerical normalization, categorical embeddings)
Let GNN message passing discover cross-entity patterns while tabular features capture entity-level patterns

This is what relational deep learning does: treat the database as a graph (relational structure) while encoding each node's features with tabular best practices (normalization, embeddings).

Key Takeaways

1Tabular data treats rows independently. Graph data captures relationships. The choice depends on where the prediction signal lives: entity features (tabular) or entity connections (graph).
2On single-table data, XGBoost/LightGBM are state of the art. On multi-table relational data, GNNs outperform by 13+ AUROC points. Match the representation to the data structure.
3Most enterprise data is relational but presented as tabular (after flattening). The flat table is the result of destroying graph structure. Representing the original relational data as a graph recovers the lost signal.
4Engineering graph features into flat tables (degree, PageRank) captures some structural signal but cannot replace GNN message passing, which automatically discovers relevant multi-hop patterns.
5The best systems combine both: graph structure for relational data, tabular feature handling for individual node attributes. Relational deep learning implements this hybrid naturally.

Tabular Data vs Graph Data: When Flat Tables Work and When You Need Graph Structure