Feature engineering is the largest time investment in enterprise ML, and graph learning eliminates most of it. In a typical enterprise ML project, 80% of the time goes to understanding the database schema, writing SQL joins, computing aggregations, and building a flat feature table. The actual model training takes days. The feature engineering takes months. Graph learning replaces this manual process with automatic cross-table pattern discovery.
The manual feature engineering workflow
Consider building a customer churn model on a database with customers, orders, order_items, products, categories, support_tickets, and sessions tables:
- Schema study (1-2 weeks): understand table relationships, business meaning, data quality
- Feature brainstorming (1-2 weeks): decide what aggregations might predict churn (recency, frequency, monetary, support ticket count, session duration)
- SQL implementation (4-8 weeks): write complex queries with multiple JOINs, GROUP BYs, and time-window filters
- Feature validation (2-4 weeks): check for leakage, null handling, distribution shifts
- Iteration (4-8 weeks): add features, test, remove low-value features, add more
Total: 3-6 months. And the result is a flat table where every customer is one row with 100-500 engineered columns.
What information is lost
Flattening is inherently lossy. When you aggregate a customer's 50 orders into “avg_order_amount = $67”, you lose:
- Distribution: the customer might have 45 orders at $20 and 5 orders at $500. The average hides the bimodal pattern.
- Temporal trajectory: order amounts might be increasing (good) or decreasing (churn signal). The average hides the trend.
- Item-level patterns: the customer might be shifting from electronics to groceries. Product-level information is aggregated away.
- Cross-customer patterns: other customers who bought similar products might have already churned. This multi-hop signal does not exist in a flat customer table.
The graph learning workflow
The same churn prediction with relational deep learning:
- Schema mapping (automated): read the database schema, create the heterogeneous graph. Tables become node types, FKs become edge types.
- Feature encoding (automated): encode column values as node features. Numerical columns become floats. Categorical columns become embeddings.
- GNN training (hours): train a GNN on the graph. Message passing discovers cross-table patterns automatically.
Total: hours to days, depending on dataset size. No SQL. No manual aggregation decisions. No time-window tuning.
What GNNs discover that humans do not engineer
On RelBench benchmarks, GNN models discover patterns that expert feature engineers miss:
- Multi-hop correlations: customers whose purchased products were also purchased by customers who subsequently churned are at higher risk. This is a 3-hop pattern (customer → product → other customer → churn). No feature engineer computes this.
- Structural patterns: customers with diverse purchase graphs (many categories, many brands) churn differently from customers with focused purchase graphs. The topology itself is the feature.
- Interaction effects across tables: the combination of high support ticket frequency AND declining order amounts AND recent product returns. GNNs learn these automatically; engineers would need to enumerate them manually.
When feature engineering still wins
Graph learning is not always superior:
- Single-table data: no relational structure means no graph advantage. XGBoost on engineered features remains strong.
- Small data: GNNs need enough edges to learn meaningful patterns. Very sparse graphs may not benefit.
- Domain-specific features: some features require domain knowledge that data cannot reveal (regulatory requirements, business rules). These should be engineered manually and used as initial node features.
The hybrid approach
The best production systems combine both: use domain-expert-engineered features as initial node features, then let GNNs discover additional cross-table patterns through message passing. This gives you human insight (what features to compute from domain knowledge) plus machine discovery (what cross-table patterns correlate with the target).