The distinction between tabular and graph data is about where the prediction signal lives. If the signal is in each entity's own features (predict house price from square footage and bedrooms), a flat table is the right representation. If the signal is in the relationships between entities (predict fraud from transaction network topology), you need a graph.
Tabular data: when rows are independent
A tabular dataset treats each row as an independent observation:
- Each row has a fixed set of features (columns)
- Rows do not depend on each other
- The order of rows does not matter
- Prediction uses only the features of the target row
Classic tabular problems: predicting house prices (features: square footage, bedrooms, location), classifying iris species (features: petal length, width), credit scoring from application data (features: income, employment, credit history).
Gradient-boosted trees (XGBoost, LightGBM, CatBoost) are the state of the art for tabular data. They handle heterogeneous feature types, missing values, and non-linear relationships efficiently. For genuinely single-table problems, they remain extremely competitive.
Graph data: when relationships carry signal
Graph data explicitly represents relationships between entities:
- Entities are nodes with features
- Relationships are edges connecting nodes
- The prediction depends on the entity's connections, not just its own features
- Structural patterns (clusters, paths, hubs) carry signal
Graph problems: fraud detection (transaction networks), recommendation (user-item interactions), drug discovery (molecular structure), social analysis (influence networks), supply chain (supplier-manufacturer relationships).
The hidden graph in enterprise data
Most enterprise data looks tabular but is actually relational. A “customer churn” table with 200 features was derived from 10+ source tables through JOINs and aggregations. The flat table is the result of flattening a graph.
The question is not “is my data tabular or graph?” but “am I losing information by flattening my naturally relational data into a flat table?” When the source data has foreign key relationships between tables, the answer is usually yes.
Quantitative comparison
On RelBench (enterprise relational databases):
- Flat-table LightGBM (expert features): 62.44 avg AUROC
- GNN on relational graph: 75.83 avg AUROC (+13.4 points)
On Kaggle-style single-table competitions:
- Gradient-boosted trees win most competitions
- GNNs provide minimal improvement over tabular models
The pattern is clear: GNNs excel when relational structure exists. On flat single-table data, tabular models are sufficient and often preferable (faster training, simpler deployment, well-understood).
The hybrid path
In practice, the best approach combines both:
- Use graph representation for the relational source data (preserve multi-table structure)
- Use tabular-style feature handling for individual node features (numerical normalization, categorical embeddings)
- Let GNN message passing discover cross-entity patterns while tabular features capture entity-level patterns
This is what relational deep learning does: treat the database as a graph (relational structure) while encoding each node's features with tabular best practices (normalization, embeddings).