What is a graph neural network (GNN)?

A graph neural network is a type of neural network that operates directly on graph-structured data. Instead of processing flat rows or grids of pixels, a GNN processes nodes (entities) and edges (relationships). It learns by passing messages between connected nodes, allowing each node to build a representation that incorporates information from its neighborhood.

What is message passing in a GNN?

Message passing is the core computation in a GNN. In each layer, every node collects 'messages' from its neighbors (their current representations), aggregates them (sum, mean, or attention-weighted combination), and updates its own representation. After k layers, each node's embedding encodes information from all nodes within k hops, enabling the model to capture multi-hop relational patterns.

How do GNNs compare to transformers?

Standard transformers apply self-attention across all tokens, which is quadratic in sequence length. GNNs apply attention only between connected nodes, which is linear in the number of edges. For graph-structured data, GNNs are more efficient and naturally respect the topology. Graph transformers combine both: local message passing with global attention, achieving the best results on relational benchmarks.

What are the scaling challenges for GNNs?

Three main challenges: (1) Neighborhood explosion: with k layers, each node's computation depends on all nodes within k hops, which can be millions. Mini-batch training with neighbor sampling solves this. (2) Memory: full-graph training on large graphs exceeds GPU memory. Graph partitioning and distributed training address this. (3) Inference latency: real-time predictions require fast graph lookups and neighbor sampling.

How do GNNs handle cold-start problems?

GNNs excel at cold-start problems because they can predict based on graph structure alone. A new user with zero transaction history but connections to known entities (the product they first viewed, the channel they came from, the referrer who invited them) inherits signal from those connections through message passing. Traditional models cannot make any prediction for entities with no historical features.

What is over-smoothing in GNNs?

Over-smoothing occurs when too many message passing layers cause all node representations to converge to similar values, losing distinguishing information. With 10+ layers, every node's embedding becomes an average of the entire graph. Solutions: skip connections (residual GNNs), jumping knowledge networks, or limiting depth to 2-4 layers and using global attention for long-range dependencies.

How do GNNs perform on standard benchmarks?

On RelBench (7 databases, 30 tasks, 103M+ rows), supervised GNNs achieve 75.83 average AUROC on classification tasks, compared to 62.44 for LightGBM with manual features. That is a 13.4-point improvement. On molecular property prediction (OGB-MolHIV), GNNs achieve 0.80+ AUROC. On social network link prediction (OGB-Citation2), GNNs achieve 0.87+ MRR.

Graph Neural Networks FAQ: 15 Questions Answered | Kumo.ai

Q: How does a GNN differ from a CNN or RNN?

CNNs operate on regular grids (images) where every pixel has exactly 4-8 neighbors in fixed positions. RNNs operate on sequences (text, time series) with a strict linear order. GNNs operate on arbitrary graph structures where nodes can have any number of neighbors in any configuration. This makes GNNs the only neural network architecture that naturally handles relational data.

Q: What types of GNNs exist?

Major variants: GCN (Graph Convolutional Network) uses spectral graph convolutions. GraphSAGE uses sampling-based message passing for scalability. GAT (Graph Attention Network) uses attention mechanisms to weight neighbor contributions. GIN (Graph Isomorphism Network) is provably as powerful as the Weisfeiler-Leman graph isomorphism test. Graph Transformers apply self-attention across the graph.

Q: Where are GNNs used in production?

Major production deployments: Pinterest (content recommendations for 450M users), DoorDash (merchant and item recommendations, 1.8% engagement lift), Visa (fraud detection on billions of transactions), Google (traffic prediction in Google Maps), Twitter/X (content ranking and recommendation), Amazon (product recommendations), and Snap (friend suggestions). These systems serve millions of predictions per second.

Q: Can GNNs handle temporal data?

Yes, with temporal encoding. Temporal GNNs add time information to edges and nodes, allowing the model to distinguish recent events from older ones and learn temporal patterns (acceleration, seasonality, decay). On RelBench, temporal GNNs outperform static GNNs by 2-5 AUROC points on tasks where recency and sequence patterns matter.

Q: How much data does a GNN need?

GNNs need a graph (nodes and edges) plus labels for the prediction task. The graph itself can be built from any relational database by mapping rows to nodes and foreign keys to edges. For supervised training, 10,000 to 100,000 labeled nodes are typical. The graph structure provides an inductive bias that reduces the label requirement compared to flat models.

Q: What is a heterogeneous graph?

A heterogeneous graph has multiple types of nodes and edges. In an e-commerce database: customer nodes, product nodes, order nodes, and category nodes, connected by 'purchased', 'belongs_to', 'contains', and 'viewed' edge types. Enterprise relational databases naturally produce heterogeneous graphs because each table is a different entity type with different attributes.

Graph neural networks have moved from academic papers to production systems at Pinterest, DoorDash, Visa, and Google. But most explanations are either too theoretical (spectral graph convolutions) or too superficial ("like a neural network but for graphs"). These 15 questions cover what practitioners actually need to know.

1. What is a graph neural network?

A GNN is a neural network that operates on graph-structured data: nodes (entities) connected by edges (relationships). Unlike traditional neural networks that process flat vectors or regular grids, GNNs process arbitrary connection patterns.

A customer connected to 47 orders, each connected to products, each connected to categories and other customers. That is a graph. A GNN learns from that entire connected structure, not just the customer's own attributes.

2. How does a GNN differ from a CNN or RNN?

CNNs assume a regular grid structure. Every pixel has the same number of neighbors in the same positions. This works for images but fails for data where entities have varying numbers of connections in arbitrary configurations.

RNNs assume a sequential structure. Each element has one predecessor and one successor. This works for text and time series but fails for data where entities relate to many other entities simultaneously.

GNNs handle the general case: any entity can connect to any number of other entities in any pattern. This makes GNNs the natural architecture for relational databases, social networks, transaction graphs, molecular structures, and supply chains.

3. What is message passing?

Message passing is the core operation. In each GNN layer:

Every node collects representations from its neighbors
These representations are aggregated (summed, averaged, or attention-weighted)
The node updates its own representation based on the aggregation and its previous state

After one layer, each node knows about its direct neighbors. After two layers, it knows about neighbors of neighbors. After three layers, it encodes information from the entire 3-hop neighborhood. For a customer node in an e-commerce graph, 3 hops covers: orders, products, other customers who bought those products, and their behavior patterns.

message_passing_example: Customer C-201

Hop	Nodes Reached	Information Gained	Example Signal
0 (self)	C-201	Own attributes	credit_limit=$15K, account_age=4yr
1 (neighbors)	3 orders, 1 support ticket	Direct interactions	avg_order=$142, open_ticket=yes
2 (2-hop)	7 products, 2 agents	What they bought, who helped	high-return products, low-CSAT agent
3 (3-hop)	45 other customers	Similar customers' behavior	34 of 45 similar customers churned

By hop 3, the GNN knows that 75% of customers who bought the same products and had similar support experiences have churned. No flat feature table captures this 3-hop signal.

4. What types of GNNs exist?

The major architectures, in order of publication:

GCN (2017): Applies spectral convolutions on graphs. Simple and effective but assumes a fixed graph structure.
GraphSAGE (2017): Samples and aggregates neighbor features. Scales to large graphs because it does not require the full adjacency matrix.
GAT (2018): Uses attention mechanisms to learn which neighbors are most informative. Different neighbors get different weights.
GIN (2019): Provably as powerful as the Weisfeiler-Leman graph isomorphism test. Maximally expressive among message-passing GNNs.
Graph Transformers (2020+): Combine local message passing with global self-attention. Best results on most current benchmarks. KumoRFM uses a graph transformer architecture.

gnn_architectures_compared

Architecture	Year	Aggregation Method	Scalability	Best For
GCN	2017	Spectral convolution	Moderate (full graph)	Small homogeneous graphs
GraphSAGE	2017	Sampled neighbor mean/pool	High (mini-batch)	Large-scale production
GAT	2018	Attention-weighted	Moderate	Heterogeneous neighbor importance
GIN	2019	Sum (injective)	Moderate	Maximum expressiveness
Graph Transformer	2020+	Local + global attention	High (with sampling)	Multi-table relational data

Graph transformers combine the best of GNNs (local structure) and transformers (global attention). KumoRFM uses this architecture for relational databases.

5. Where are GNNs used in production?

Published production deployments at scale:

Pinterest: PinSage serves recommendations to 450 million monthly active users over a graph of 18 billion pins
DoorDash: Heterogeneous graph of customers, restaurants, and items. 1.8% engagement lift across 30 million users.
Visa: Fraud detection across billions of transactions, identifying fraud rings that transaction-level models miss
Google Maps: Traffic prediction using road network graphs with real-time sensor data as node features
Snap: Friend suggestions based on the social graph structure and interaction patterns

6. GNNs vs. transformers

Standard transformers compute self-attention between all pairs of tokens. For a sequence of length n, this is O(n^2). For a graph of 1 million nodes, full attention is computationally infeasible.

GNNs compute attention only between connected nodes, making it O(E) where E is the number of edges. This is far more efficient for sparse graphs (most real-world graphs are sparse).

Graph transformers combine both: local message passing for efficiency, plus global attention mechanisms for long-range dependencies. On RelBench, graph transformers outperform both pure GNNs and pure transformers on relational prediction tasks.

7. Can GNNs handle temporal data?

Yes. Temporal GNNs add timestamps to edges and nodes, allowing the model to distinguish between recent and historical interactions. This enables learning patterns like: recency effects (recent orders predict churn better than old orders), frequency changes (accelerating vs. decelerating activity), and seasonal patterns (holiday purchasing behavior).

On RelBench, temporal encoding improves AUROC by 2 to 5 points on tasks where recency matters (churn, next-purchase prediction).

8. What are the scaling challenges?

Three challenges, all with proven solutions:

Neighborhood explosion: With 3 layers and an average of 50 neighbors per node, each prediction touches 50^3 = 125,000 nodes. Solution: neighbor sampling, where each layer samples a fixed number of neighbors (typically 10-25).
Memory: A graph with 100 million nodes and 1 billion edges does not fit in GPU memory. Solution: mini-batch training with graph partitioning (Cluster-GCN, GraphSAINT) or distributed training across multiple GPUs.
Inference latency: Real-time predictions require fast neighbor lookups. Solution: pre-computed neighbor indices, embedding caching, and hardware-optimized graph databases.

9. How much data does a GNN need?

A GNN needs a graph structure (which you derive from your relational database) and labeled examples for the prediction task. The graph structure itself acts as a regularizer, reducing the number of labels needed compared to flat models.

In practice, 10,000 to 100,000 labeled nodes are sufficient for supervised GNN training. For foundation model approaches, zero labels are needed: the model uses pre-trained knowledge to make zero-shot predictions.

10. What is a heterogeneous graph?

A graph with multiple types of nodes and edges. An e-commerce database produces a heterogeneous graph with customer nodes, product nodes, order nodes, and category nodes, connected by "purchased," "contains," "belongs_to," and "viewed" edge types.

Heterogeneous GNNs learn separate transformation functions for each node and edge type, then combine them during message passing. This is critical for relational databases where each table has different attributes and semantics.

11. How do GNNs solve cold-start problems?

A new user with zero purchase history has no features for a traditional model. But they have connections: the product they first browsed, the marketing channel they came from, the referrer who sent them. Through message passing, a GNN propagates signal from these connected entities to the new user, generating a meaningful embedding even with no direct history.

This is one of the highest-value applications of GNNs. Cold-start users are often the highest-value segment for recommendations and the hardest to serve with traditional models.

12. What is over-smoothing?

When a GNN has too many layers, all node representations converge toward a global average, losing the distinguishing information that makes individual predictions useful. With 10+ layers on most graphs, node embeddings become nearly identical.

Practical solutions: limit depth to 2-4 message passing layers (which covers 2-4 hops of relational signal), add skip connections (residual GNNs), or use jumping knowledge networks that combine representations from all layers. Graph transformers partially avoid this by adding global attention that does not depend on local message passing depth.

13. Benchmark performance

Current results on major benchmarks:

RelBench (relational data, 30 tasks): supervised GNN 75.83 AUROC, KumoRFM zero-shot 76.71, KumoRFM fine-tuned 81.14
OGB-MolHIV (molecular property prediction): 0.80+ AUROC with graph transformer variants
OGB-Citation2 (academic citation link prediction): 0.87+ MRR with GraphSAGE and SEAL
QM9 (quantum chemistry): GNNs achieve chemical accuracy on 11 of 12 molecular properties

14. Do I need to build my own GNN?

For most enterprise applications, no. Building a custom GNN requires specialized expertise (message passing schemes, neighborhood sampling, temporal encoding, graph construction) and 3 to 6 months of engineering. Relational foundation models encapsulate GNN architectures behind a simple query interface.

Build your own GNN only if: you have a unique graph structure that foundation models have not seen, you need full architectural control for competitive advantage, or your team has existing GNN expertise and a single high-stakes use case.

15. What is the future of GNNs?

Three trends are converging. First, foundation models pre-train GNN architectures on massive graph corpora, enabling zero-shot and few-shot predictions on new graphs. KumoRFM already demonstrates this for relational data.

Second, GNN + LLM integration combines textual reasoning (what does this product description mean?) with graph reasoning (what do purchase patterns around this product look like?).

Third, hardware-optimized GNN inference is pushing toward sub-millisecond latency for real-time applications. The technology is transitioning from research tool to commodity infrastructure, much like CNNs did between 2015 and 2020.

Key Takeaways

1GNNs operate on graph-structured data through message passing: each node aggregates information from its neighbors over multiple hops, building representations that encode the full local graph structure. After 3 layers, a customer node captures orders, products, similar customers, and their behavior.
2GNNs are in production at Pinterest (450M users), DoorDash (1.8% engagement lift), Visa (billions of transactions), and Google Maps (traffic prediction). These are not experiments -- they are core ranking and detection systems.
3On RelBench, GNNs achieve 75.83 AUROC vs. 62.44 for LightGBM with manual features. The 13.4-point gap comes from capturing multi-hop patterns, temporal sequences, and network effects that flat feature tables cannot represent.
4GNNs excel at cold-start problems: new users with zero history get predictions from their graph connections (first product viewed, referral channel, connected entities). Traditional models cannot predict for entities with no features.
5For most enterprises, building a custom GNN is unnecessary. Relational foundation models like KumoRFM encapsulate GNN architectures behind a PQL interface. You get graph ML accuracy without graph ML expertise.

Graph Neural Networks FAQ: 15 Questions Answered

1. What is a graph neural network?

2. How does a GNN differ from a CNN or RNN?

3. What is message passing?

4. What types of GNNs exist?

5. Where are GNNs used in production?

6. GNNs vs. transformers

7. Can GNNs handle temporal data?

8. What are the scaling challenges?

9. How much data does a GNN need?

10. What is a heterogeneous graph?

11. How do GNNs solve cold-start problems?

12. What is over-smoothing?

13. Benchmark performance

14. Do I need to build my own GNN?

15. What is the future of GNNs?

Frequently asked questions

What is a graph neural network (GNN)?

How does a GNN differ from a CNN or RNN?

What is message passing in a GNN?

What types of GNNs exist?

Where are GNNs used in production?

How do GNNs compare to transformers?

Can GNNs handle temporal data?

What are the scaling challenges for GNNs?

How much data does a GNN need?

What is a heterogeneous graph?

How do GNNs handle cold-start problems?

What is over-smoothing in GNNs?

How do GNNs perform on standard benchmarks?

Do I need to build my own GNN?

What is the future of GNNs?

Related topics

See it in action