The degree distribution is one of the most fundamental properties of any graph, and it directly shapes how GNNs process that graph. A node's degree (its number of connections) determines how many messages it receives during message passing, how much memory its neighborhood requires, and what structural role it plays. The distribution of degrees across all nodes determines whether the graph is easy or hard to partition, whether neighbor sampling is necessary, and how quickly over-smoothing occurs.
Common degree distributions
Power-law (scale-free)
P(k) is proportional to k^(-gamma). Most nodes have degree 1-10. A few hub nodes have degree 10,000-1,000,000. Found in: social networks, the web, citation networks, financial transaction networks.
The “scale-free” name comes from the distribution's self-similarity: zooming into any degree range reveals the same shape. There is no characteristic scale.
Exponential
P(k) decreases exponentially. Degrees are more uniform than power-law but still skewed. Found in: road networks, power grids, some biological networks.
Poisson (Erdos-Renyi random graph)
Degrees are concentrated around the mean. Few very high or very low degree nodes. Found in: random graphs (synthetic). Rare in real-world data.
Regular
All nodes have the same degree. Found in: crystal structures, some lattice networks. Uncommon in real-world data.
Impact on GNN design
Degree normalization
Without normalization, a hub node with 100,000 neighbors produces an aggregated message 100,000 times larger than a leaf node with 1 neighbor. GCN addresses this with degree normalization: each message is weighted by 1/sqrt(deg_i x deg_j). This prevents hub nodes from dominating the representation.
Neighbor sampling
Power-law graphs make full-neighborhood aggregation impractical: a hub node with 1M neighbors cannot be fully processed. GraphSAGE's neighbor sampling fixes this: sample a fixed number of neighbors (e.g., 25) for each node. Hub nodes use 25 of their 1M neighbors. This bounds computation per node and is necessary for any graph with power-law degree.
Over-smoothing and the small-world property
Many real-world graphs have the small-world property: despite millions of nodes, the average shortest path is only 4-7 hops. In the Facebook social graph, any two users are connected by an average of 4.7 hops.
For GNNs, this means 2-3 layers of message passing already reach a large portion of the graph. By 5-6 layers, information from the entire graph has mixed into every node's embedding, causing over-smoothing. The small-world property (a consequence of degree distribution) explains why shallow GNNs work best in practice.
Degree as a feature
Node degree is itself an informative feature:
- Hub identification: high-degree nodes are often structurally important (popular users, key accounts, major suppliers)
- Periphery identification: low-degree nodes are leaves or peripheral entities
- Positional encoding: degree provides coarse structural information that helps GNNs distinguish nodes with different topological roles
- Anomaly detection: nodes with degree significantly different from their expected degree (given node type) may be anomalous
Many GNN architectures concatenate degree as an input feature or use it as a positional encoding. The Graph Isomorphism Network (GIN) effectively encodes degree through its sum aggregation.