Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

Graph Neural Networks FAQ: 15 Questions Answered

Everything you need to know about GNNs: how they work, where they are used in production, how they compare to CNNs and transformers, and what enterprise deployment looks like.

TL;DR

  • 1A GNN learns from graph-structured data through message passing: each node aggregates information from its neighbors over multiple hops. After 3 layers, a customer node encodes orders, products, similar customers, and their behavior.
  • 2Production-proven at scale: Pinterest (450M MAU, 18B pins), DoorDash (1.8% engagement lift, 30M users), Visa (billions of transactions), Google Maps (traffic prediction), and Snap (friend suggestions).
  • 3On RelBench (7 databases, 30 tasks, 103M+ rows), GNNs achieve 75.83 AUROC vs. 62.44 for LightGBM with manual features. KumoRFM zero-shot reaches 76.71, fine-tuned 81.14.
  • 4GNNs excel at cold-start problems: new users with zero history get predictions from graph connections (first product viewed, referral channel). Traditional models cannot predict for entities with no features.
  • 5Building a custom GNN requires 2-3 specialists and 3-6 months. Relational foundation models like KumoRFM encapsulate GNN architectures behind a PQL query interface, delivering graph ML accuracy without graph ML expertise.

Graph neural networks have moved from academic papers to production systems at Pinterest, DoorDash, Visa, and Google. But most explanations are either too theoretical (spectral graph convolutions) or too superficial ("like a neural network but for graphs"). These 15 questions cover what practitioners actually need to know.

1. What is a graph neural network?

A GNN is a neural network that operates on graph-structured data: nodes (entities) connected by edges (relationships). Unlike traditional neural networks that process flat vectors or regular grids, GNNs process arbitrary connection patterns.

A customer connected to 47 orders, each connected to products, each connected to categories and other customers. That is a graph. A GNN learns from that entire connected structure, not just the customer's own attributes.

2. How does a GNN differ from a CNN or RNN?

CNNs assume a regular grid structure. Every pixel has the same number of neighbors in the same positions. This works for images but fails for data where entities have varying numbers of connections in arbitrary configurations.

RNNs assume a sequential structure. Each element has one predecessor and one successor. This works for text and time series but fails for data where entities relate to many other entities simultaneously.

GNNs handle the general case: any entity can connect to any number of other entities in any pattern. This makes GNNs the natural architecture for relational databases, social networks, transaction graphs, molecular structures, and supply chains.

3. What is message passing?

Message passing is the core operation. In each GNN layer:

  • Every node collects representations from its neighbors
  • These representations are aggregated (summed, averaged, or attention-weighted)
  • The node updates its own representation based on the aggregation and its previous state

After one layer, each node knows about its direct neighbors. After two layers, it knows about neighbors of neighbors. After three layers, it encodes information from the entire 3-hop neighborhood. For a customer node in an e-commerce graph, 3 hops covers: orders, products, other customers who bought those products, and their behavior patterns.

message_passing_example: Customer C-201

HopNodes ReachedInformation GainedExample Signal
0 (self)C-201Own attributescredit_limit=$15K, account_age=4yr
1 (neighbors)3 orders, 1 support ticketDirect interactionsavg_order=$142, open_ticket=yes
2 (2-hop)7 products, 2 agentsWhat they bought, who helpedhigh-return products, low-CSAT agent
3 (3-hop)45 other customersSimilar customers' behavior34 of 45 similar customers churned

By hop 3, the GNN knows that 75% of customers who bought the same products and had similar support experiences have churned. No flat feature table captures this 3-hop signal.

4. What types of GNNs exist?

The major architectures, in order of publication:

  • GCN (2017): Applies spectral convolutions on graphs. Simple and effective but assumes a fixed graph structure.
  • GraphSAGE (2017): Samples and aggregates neighbor features. Scales to large graphs because it does not require the full adjacency matrix.
  • GAT (2018): Uses attention mechanisms to learn which neighbors are most informative. Different neighbors get different weights.
  • GIN (2019): Provably as powerful as the Weisfeiler-Leman graph isomorphism test. Maximally expressive among message-passing GNNs.
  • Graph Transformers (2020+): Combine local message passing with global self-attention. Best results on most current benchmarks. KumoRFM uses a graph transformer architecture.

gnn_architectures_compared

ArchitectureYearAggregation MethodScalabilityBest For
GCN2017Spectral convolutionModerate (full graph)Small homogeneous graphs
GraphSAGE2017Sampled neighbor mean/poolHigh (mini-batch)Large-scale production
GAT2018Attention-weightedModerateHeterogeneous neighbor importance
GIN2019Sum (injective)ModerateMaximum expressiveness
Graph Transformer2020+Local + global attentionHigh (with sampling)Multi-table relational data

Graph transformers combine the best of GNNs (local structure) and transformers (global attention). KumoRFM uses this architecture for relational databases.

5. Where are GNNs used in production?

Published production deployments at scale:

  • Pinterest: PinSage serves recommendations to 450 million monthly active users over a graph of 18 billion pins
  • DoorDash: Heterogeneous graph of customers, restaurants, and items. 1.8% engagement lift across 30 million users.
  • Visa: Fraud detection across billions of transactions, identifying fraud rings that transaction-level models miss
  • Google Maps: Traffic prediction using road network graphs with real-time sensor data as node features
  • Snap: Friend suggestions based on the social graph structure and interaction patterns

6. GNNs vs. transformers

Standard transformers compute self-attention between all pairs of tokens. For a sequence of length n, this is O(n^2). For a graph of 1 million nodes, full attention is computationally infeasible.

GNNs compute attention only between connected nodes, making it O(E) where E is the number of edges. This is far more efficient for sparse graphs (most real-world graphs are sparse).

Graph transformers combine both: local message passing for efficiency, plus global attention mechanisms for long-range dependencies. On RelBench, graph transformers outperform both pure GNNs and pure transformers on relational prediction tasks.

7. Can GNNs handle temporal data?

Yes. Temporal GNNs add timestamps to edges and nodes, allowing the model to distinguish between recent and historical interactions. This enables learning patterns like: recency effects (recent orders predict churn better than old orders), frequency changes (accelerating vs. decelerating activity), and seasonal patterns (holiday purchasing behavior).

On RelBench, temporal encoding improves AUROC by 2 to 5 points on tasks where recency matters (churn, next-purchase prediction).

8. What are the scaling challenges?

Three challenges, all with proven solutions:

  • Neighborhood explosion: With 3 layers and an average of 50 neighbors per node, each prediction touches 50^3 = 125,000 nodes. Solution: neighbor sampling, where each layer samples a fixed number of neighbors (typically 10-25).
  • Memory: A graph with 100 million nodes and 1 billion edges does not fit in GPU memory. Solution: mini-batch training with graph partitioning (Cluster-GCN, GraphSAINT) or distributed training across multiple GPUs.
  • Inference latency: Real-time predictions require fast neighbor lookups. Solution: pre-computed neighbor indices, embedding caching, and hardware-optimized graph databases.

9. How much data does a GNN need?

A GNN needs a graph structure (which you derive from your relational database) and labeled examples for the prediction task. The graph structure itself acts as a regularizer, reducing the number of labels needed compared to flat models.

In practice, 10,000 to 100,000 labeled nodes are sufficient for supervised GNN training. For foundation model approaches, zero labels are needed: the model uses pre-trained knowledge to make zero-shot predictions.

10. What is a heterogeneous graph?

A graph with multiple types of nodes and edges. An e-commerce database produces a heterogeneous graph with customer nodes, product nodes, order nodes, and category nodes, connected by "purchased," "contains," "belongs_to," and "viewed" edge types.

Heterogeneous GNNs learn separate transformation functions for each node and edge type, then combine them during message passing. This is critical for relational databases where each table has different attributes and semantics.

11. How do GNNs solve cold-start problems?

A new user with zero purchase history has no features for a traditional model. But they have connections: the product they first browsed, the marketing channel they came from, the referrer who sent them. Through message passing, a GNN propagates signal from these connected entities to the new user, generating a meaningful embedding even with no direct history.

This is one of the highest-value applications of GNNs. Cold-start users are often the highest-value segment for recommendations and the hardest to serve with traditional models.

12. What is over-smoothing?

When a GNN has too many layers, all node representations converge toward a global average, losing the distinguishing information that makes individual predictions useful. With 10+ layers on most graphs, node embeddings become nearly identical.

Practical solutions: limit depth to 2-4 message passing layers (which covers 2-4 hops of relational signal), add skip connections (residual GNNs), or use jumping knowledge networks that combine representations from all layers. Graph transformers partially avoid this by adding global attention that does not depend on local message passing depth.

13. Benchmark performance

Current results on major benchmarks:

  • RelBench (relational data, 30 tasks): supervised GNN 75.83 AUROC, KumoRFM zero-shot 76.71, KumoRFM fine-tuned 81.14
  • OGB-MolHIV (molecular property prediction): 0.80+ AUROC with graph transformer variants
  • OGB-Citation2 (academic citation link prediction): 0.87+ MRR with GraphSAGE and SEAL
  • QM9 (quantum chemistry): GNNs achieve chemical accuracy on 11 of 12 molecular properties

14. Do I need to build my own GNN?

For most enterprise applications, no. Building a custom GNN requires specialized expertise (message passing schemes, neighborhood sampling, temporal encoding, graph construction) and 3 to 6 months of engineering. Relational foundation models encapsulate GNN architectures behind a simple query interface.

Build your own GNN only if: you have a unique graph structure that foundation models have not seen, you need full architectural control for competitive advantage, or your team has existing GNN expertise and a single high-stakes use case.

15. What is the future of GNNs?

Three trends are converging. First, foundation models pre-train GNN architectures on massive graph corpora, enabling zero-shot and few-shot predictions on new graphs. KumoRFM already demonstrates this for relational data.

Second, GNN + LLM integration combines textual reasoning (what does this product description mean?) with graph reasoning (what do purchase patterns around this product look like?).

Third, hardware-optimized GNN inference is pushing toward sub-millisecond latency for real-time applications. The technology is transitioning from research tool to commodity infrastructure, much like CNNs did between 2015 and 2020.

Frequently asked questions

What is a graph neural network (GNN)?

A graph neural network is a type of neural network that operates directly on graph-structured data. Instead of processing flat rows or grids of pixels, a GNN processes nodes (entities) and edges (relationships). It learns by passing messages between connected nodes, allowing each node to build a representation that incorporates information from its neighborhood.

How does a GNN differ from a CNN or RNN?

CNNs operate on regular grids (images) where every pixel has exactly 4-8 neighbors in fixed positions. RNNs operate on sequences (text, time series) with a strict linear order. GNNs operate on arbitrary graph structures where nodes can have any number of neighbors in any configuration. This makes GNNs the only neural network architecture that naturally handles relational data.

What is message passing in a GNN?

Message passing is the core computation in a GNN. In each layer, every node collects 'messages' from its neighbors (their current representations), aggregates them (sum, mean, or attention-weighted combination), and updates its own representation. After k layers, each node's embedding encodes information from all nodes within k hops, enabling the model to capture multi-hop relational patterns.

What types of GNNs exist?

Major variants: GCN (Graph Convolutional Network) uses spectral graph convolutions. GraphSAGE uses sampling-based message passing for scalability. GAT (Graph Attention Network) uses attention mechanisms to weight neighbor contributions. GIN (Graph Isomorphism Network) is provably as powerful as the Weisfeiler-Leman graph isomorphism test. Graph Transformers apply self-attention across the graph.

Where are GNNs used in production?

Major production deployments: Pinterest (content recommendations for 450M users), DoorDash (merchant and item recommendations, 1.8% engagement lift), Visa (fraud detection on billions of transactions), Google (traffic prediction in Google Maps), Twitter/X (content ranking and recommendation), Amazon (product recommendations), and Snap (friend suggestions). These systems serve millions of predictions per second.

How do GNNs compare to transformers?

Standard transformers apply self-attention across all tokens, which is quadratic in sequence length. GNNs apply attention only between connected nodes, which is linear in the number of edges. For graph-structured data, GNNs are more efficient and naturally respect the topology. Graph transformers combine both: local message passing with global attention, achieving the best results on relational benchmarks.

Can GNNs handle temporal data?

Yes, with temporal encoding. Temporal GNNs add time information to edges and nodes, allowing the model to distinguish recent events from older ones and learn temporal patterns (acceleration, seasonality, decay). On RelBench, temporal GNNs outperform static GNNs by 2-5 AUROC points on tasks where recency and sequence patterns matter.

What are the scaling challenges for GNNs?

Three main challenges: (1) Neighborhood explosion: with k layers, each node's computation depends on all nodes within k hops, which can be millions. Mini-batch training with neighbor sampling solves this. (2) Memory: full-graph training on large graphs exceeds GPU memory. Graph partitioning and distributed training address this. (3) Inference latency: real-time predictions require fast graph lookups and neighbor sampling.

How much data does a GNN need?

GNNs need a graph (nodes and edges) plus labels for the prediction task. The graph itself can be built from any relational database by mapping rows to nodes and foreign keys to edges. For supervised training, 10,000 to 100,000 labeled nodes are typical. The graph structure provides an inductive bias that reduces the label requirement compared to flat models.

What is a heterogeneous graph?

A heterogeneous graph has multiple types of nodes and edges. In an e-commerce database: customer nodes, product nodes, order nodes, and category nodes, connected by 'purchased', 'belongs_to', 'contains', and 'viewed' edge types. Enterprise relational databases naturally produce heterogeneous graphs because each table is a different entity type with different attributes.

How do GNNs handle cold-start problems?

GNNs excel at cold-start problems because they can predict based on graph structure alone. A new user with zero transaction history but connections to known entities (the product they first viewed, the channel they came from, the referrer who invited them) inherits signal from those connections through message passing. Traditional models cannot make any prediction for entities with no historical features.

What is over-smoothing in GNNs?

Over-smoothing occurs when too many message passing layers cause all node representations to converge to similar values, losing distinguishing information. With 10+ layers, every node's embedding becomes an average of the entire graph. Solutions: skip connections (residual GNNs), jumping knowledge networks, or limiting depth to 2-4 layers and using global attention for long-range dependencies.

How do GNNs perform on standard benchmarks?

On RelBench (7 databases, 30 tasks, 103M+ rows), supervised GNNs achieve 75.83 average AUROC on classification tasks, compared to 62.44 for LightGBM with manual features. That is a 13.4-point improvement. On molecular property prediction (OGB-MolHIV), GNNs achieve 0.80+ AUROC. On social network link prediction (OGB-Citation2), GNNs achieve 0.87+ MRR.

Do I need to build my own GNN?

Not anymore. Building a custom GNN from PyTorch Geometric or DGL requires 2-3 ML engineers with GNN expertise and 3-6 months. Relational foundation models like KumoRFM use GNN architectures internally but expose a simple query interface. You write a prediction query in PQL and get results. The graph construction, training, and serving are handled automatically.

What is the future of GNNs?

Three trends: (1) Foundation models that pre-train GNN architectures on diverse graphs and transfer to new domains (KumoRFM already does this). (2) Integration with LLMs, combining textual reasoning with graph-based relational reasoning. (3) Real-time GNN inference at millisecond latency for applications like fraud detection and real-time recommendations. The technology is moving from research to commodity infrastructure.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.