Neighborhood aggregation is the operation by which each node in a graph neural network collects feature vectors from its direct neighbors and combines them into a single fixed-size representation. It is the mechanism that distinguishes GNNs from standard neural networks. Without aggregation, nodes would never learn from graph structure. Every GNN layer, from GCN to GAT to GIN, implements aggregation differently, and that choice determines what the model can and cannot learn.
Why it matters for enterprise data
Enterprise relational databases are collections of tables linked by foreign keys. Customers link to orders. Orders link to products. Products link to categories. When you represent this as a graph, each foreign key becomes an edge, and neighborhood aggregation becomes the mechanism that propagates information across tables.
Consider a churn prediction task. A flat-table approach requires a data scientist to manually write SQL aggregations: average order value per customer, count of returns in the last 30 days, most frequent product category. Neighborhood aggregation computes these patterns automatically. The customer node aggregates its order neighbors, and the learned aggregation discovers which patterns matter for predicting churn.
On the RelBench benchmark, GNNs using learned neighborhood aggregation on relational graphs achieve 75.83 AUROC compared to 62.44 for flat-table LightGBM across 30 enterprise tasks.
How it works: three aggregation functions
The aggregation function must be permutation-invariant: the result must be the same regardless of the order in which neighbors are processed. Three functions dominate:
Sum aggregation
Adds all neighbor feature vectors element-wise. A node with 10 neighbors produces a larger aggregated vector than a node with 2. This preserves information about neighborhood size, which is critical for tasks like fraud detection where high-degree nodes (accounts with many transactions) behave differently from low-degree nodes. GINConv uses sum aggregation because it provides maximum expressiveness equal to the Weisfeiler-Leman test.
Mean aggregation
Averages all neighbor vectors. This normalizes by node degree, so a customer with 100 orders and a customer with 5 orders produce comparable representations. GCNConv uses a degree-normalized variant of mean aggregation. Mean works well when the distribution of neighbor features matters more than the count.
Max aggregation
Takes the element-wise maximum across all neighbor vectors. This acts like a filter that captures the most extreme signal in each feature dimension. GraphSAGE supports max aggregation as an option. It is useful when a single outlier neighbor (e.g., one very large transaction) is more informative than the average.
Concrete example: customer spend aggregation
Consider a database with a customers table and an orders table linked by customer_id. Represented as a graph:
- Customer nodes: features = [age, tenure_months, region]
- Order nodes: features = [amount, item_count, days_since]
- Edges: customer → order (placed_by)
Customer “Alice” has 3 orders with amounts [$50, $120, $30]. After one layer of aggregation on the amount dimension:
- Sum: $200 (total spend)
- Mean: $66.67 (average order value)
- Max: $120 (largest single order)
PyG implementation
In PyTorch Geometric, aggregation is controlled by the aggr parameter in the MessagePassing constructor:
import torch
from torch_geometric.nn import MessagePassing
class SumAggLayer(MessagePassing):
"""Sum aggregation - preserves neighborhood size."""
def __init__(self, in_dim, out_dim):
super().__init__(aggr='add') # sum aggregation
self.lin = torch.nn.Linear(in_dim, out_dim)
def forward(self, x, edge_index):
return self.lin(self.propagate(edge_index, x=x))
def message(self, x_j):
return x_j # send neighbor features as-is
class MeanAggLayer(MessagePassing):
"""Mean aggregation - normalizes by degree."""
def __init__(self, in_dim, out_dim):
super().__init__(aggr='mean') # mean aggregation
self.lin = torch.nn.Linear(in_dim, out_dim)
def forward(self, x, edge_index):
return self.lin(self.propagate(edge_index, x=x))
def message(self, x_j):
return x_j
# PyG also supports: aggr='max', aggr='softmax', aggr='powermean'
# Or use MultiAggregation to combine multiple:
from torch_geometric.nn import MultiAggregation
multi_aggr = MultiAggregation(['sum', 'mean', 'max'])The only difference between these layers is the aggr parameter. PyG handles the rest.
Limitations and what comes next
Neighborhood aggregation has inherent constraints:
- Information loss: Compressing an arbitrary number of neighbor vectors into one fixed-size vector necessarily discards information. A customer with 1,000 orders loses detail that a customer with 5 orders preserves.
- Over-smoothing: Repeated aggregation across layers causes all node representations to converge. After 5-6 layers, nodes become indistinguishable. This limits practical depth to 2-3 layers.
- Expressiveness bounds: Mean and max aggregation cannot distinguish certain multisets of neighbor features. Only sum aggregation (as in GINConv) achieves maximum expressiveness under the Weisfeiler-Leman test.
Graph rewiring and skip connections mitigate over-smoothing. Graph transformers bypass local aggregation entirely by allowing every node to attend to every other node, removing the neighborhood bottleneck.