Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Feature Engineering vs Graph Learning: Manual Feature Tables vs Automatic Pattern Discovery

Feature engineering is the bottleneck of enterprise ML: 2-6 months of SQL, aggregations, and manual decisions about what patterns might matter. Graph learning replaces this with automatic cross-table pattern discovery through message passing.

PyTorch Geometric

TL;DR

  • 1Manual feature engineering: a data scientist writes SQL to join tables, compute aggregations (AVG, COUNT, MAX over time windows), and flatten everything into a single row per target. This takes 2-6 months for a typical enterprise dataset.
  • 2Graph learning: represent the database as a graph (rows=nodes, FKs=edges) and let GNNs discover cross-table patterns through message passing. No manual SQL, no aggregation decisions, no time-window tuning.
  • 3Feature engineering is lossy: aggregating orders into 'avg order amount' destroys the distribution. Graph learning preserves individual-level information and learns which aggregation patterns matter from data.
  • 4The GNN advantage grows with relational complexity. Single-table: marginal improvement. 5+ tables: significant improvement. 10+ tables with multi-hop dependencies: transformative improvement.
  • 5Hybrid approaches work: use engineered features as initial node features, then let GNNs discover additional patterns. Domain knowledge plus automatic discovery.

Feature engineering is the largest time investment in enterprise ML, and graph learning eliminates most of it. In a typical enterprise ML project, 80% of the time goes to understanding the database schema, writing SQL joins, computing aggregations, and building a flat feature table. The actual model training takes days. The feature engineering takes months. Graph learning replaces this manual process with automatic cross-table pattern discovery.

The manual feature engineering workflow

Consider building a customer churn model on a database with customers, orders, order_items, products, categories, support_tickets, and sessions tables:

  1. Schema study (1-2 weeks): understand table relationships, business meaning, data quality
  2. Feature brainstorming (1-2 weeks): decide what aggregations might predict churn (recency, frequency, monetary, support ticket count, session duration)
  3. SQL implementation (4-8 weeks): write complex queries with multiple JOINs, GROUP BYs, and time-window filters
  4. Feature validation (2-4 weeks): check for leakage, null handling, distribution shifts
  5. Iteration (4-8 weeks): add features, test, remove low-value features, add more

Total: 3-6 months. And the result is a flat table where every customer is one row with 100-500 engineered columns.

What information is lost

Flattening is inherently lossy. When you aggregate a customer's 50 orders into “avg_order_amount = $67”, you lose:

  • Distribution: the customer might have 45 orders at $20 and 5 orders at $500. The average hides the bimodal pattern.
  • Temporal trajectory: order amounts might be increasing (good) or decreasing (churn signal). The average hides the trend.
  • Item-level patterns: the customer might be shifting from electronics to groceries. Product-level information is aggregated away.
  • Cross-customer patterns: other customers who bought similar products might have already churned. This multi-hop signal does not exist in a flat customer table.

The graph learning workflow

The same churn prediction with relational deep learning:

  1. Schema mapping (automated): read the database schema, create the heterogeneous graph. Tables become node types, FKs become edge types.
  2. Feature encoding (automated): encode column values as node features. Numerical columns become floats. Categorical columns become embeddings.
  3. GNN training (hours): train a GNN on the graph. Message passing discovers cross-table patterns automatically.

Total: hours to days, depending on dataset size. No SQL. No manual aggregation decisions. No time-window tuning.

What GNNs discover that humans do not engineer

On RelBench benchmarks, GNN models discover patterns that expert feature engineers miss:

  • Multi-hop correlations: customers whose purchased products were also purchased by customers who subsequently churned are at higher risk. This is a 3-hop pattern (customer → product → other customer → churn). No feature engineer computes this.
  • Structural patterns: customers with diverse purchase graphs (many categories, many brands) churn differently from customers with focused purchase graphs. The topology itself is the feature.
  • Interaction effects across tables: the combination of high support ticket frequency AND declining order amounts AND recent product returns. GNNs learn these automatically; engineers would need to enumerate them manually.

When feature engineering still wins

Graph learning is not always superior:

  • Single-table data: no relational structure means no graph advantage. XGBoost on engineered features remains strong.
  • Small data: GNNs need enough edges to learn meaningful patterns. Very sparse graphs may not benefit.
  • Domain-specific features: some features require domain knowledge that data cannot reveal (regulatory requirements, business rules). These should be engineered manually and used as initial node features.

The hybrid approach

The best production systems combine both: use domain-expert-engineered features as initial node features, then let GNNs discover additional cross-table patterns through message passing. This gives you human insight (what features to compute from domain knowledge) plus machine discovery (what cross-table patterns correlate with the target).

Frequently asked questions

What does manual feature engineering involve for relational data?

A data scientist studies the database schema, writes SQL joins across tables, computes aggregations (COUNT, AVG, MAX, SUM over various windows), creates ratios and interactions, and flattens everything into a single row per prediction target. For a customer churn model, this might mean computing 200+ features from 10+ tables: average order amount, days since last order, count of support tickets, most frequent product category, etc. This typically takes 2-6 months.

What patterns does feature engineering miss?

Feature engineering misses multi-hop patterns (correlations between a customer's behavior and the behavior of customers who bought the same products), structural patterns (the topology of a customer's transaction network), and long-tail combinatorial patterns (specific combinations of product categories, time windows, and transaction amounts that no human would think to engineer). These are exactly the patterns GNNs discover through message passing.

Is graph learning always better than feature engineering?

No. For single-table data with no relational structure (purely tabular), gradient-boosted trees with manual features remain competitive. Graph learning excels when: (1) data spans multiple related tables, (2) the prediction depends on multi-hop relationships, and (3) structural patterns matter. The more relational structure exists, the larger the graph learning advantage.

Can you combine feature engineering with graph learning?

Yes. Many production systems use engineered features as initial node features, then let GNNs discover additional patterns through message passing. This combines human domain knowledge (which features to compute) with automatic pattern discovery (which cross-table patterns matter). On RelBench, this hybrid approach often outperforms either method alone.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.