What is degree distribution?

The degree of a node is the number of edges connected to it. The degree distribution P(k) gives the probability that a randomly selected node has degree k. In a social network, degree is the number of friends. In a citation network, degree is the number of citations. The distribution shape (Poisson, power-law, uniform) determines many graph properties and affects GNN design.

What is a power-law (scale-free) degree distribution?

A power-law degree distribution has P(k) proportional to k^(-gamma), typically with gamma between 2 and 3. This means most nodes have few connections, but a small number of 'hub' nodes have enormous degree. Social networks, the web, and citation networks follow power laws. The tail of the distribution is 'heavy': hub nodes can have millions of connections.

How does degree distribution affect GNN performance?

High-degree hub nodes aggregate messages from many neighbors, producing over-averaged (smooth) embeddings. Low-degree nodes aggregate from few neighbors, potentially missing important signals. Degree normalization (GCN) and neighbor sampling (GraphSAGE) address these extremes. Without these techniques, hub nodes dominate the learned representations.

What is the small-world property?

Many real-world graphs have the small-world property: despite being large (millions of nodes), the average shortest path between any two nodes is surprisingly small (typically 4-7 hops). This means 2-3 GNN layers can propagate information across a significant fraction of the graph. It also means over-smoothing is a real risk: with small diameter, even 3-4 layers can reach most of the graph.

Degree Distribution in Graphs: How Connection Counts Vary Across Nodes | Kumo.ai

The degree distribution is one of the most fundamental properties of any graph, and it directly shapes how GNNs process that graph. A node's degree (its number of connections) determines how many messages it receives during message passing, how much memory its neighborhood requires, and what structural role it plays. The distribution of degrees across all nodes determines whether the graph is easy or hard to partition, whether neighbor sampling is necessary, and how quickly over-smoothing occurs.

Common degree distributions

Power-law (scale-free)

P(k) is proportional to k^(-gamma). Most nodes have degree 1-10. A few hub nodes have degree 10,000-1,000,000. Found in: social networks, the web, citation networks, financial transaction networks.

The “scale-free” name comes from the distribution's self-similarity: zooming into any degree range reveals the same shape. There is no characteristic scale.

Exponential

P(k) decreases exponentially. Degrees are more uniform than power-law but still skewed. Found in: road networks, power grids, some biological networks.

Poisson (Erdos-Renyi random graph)

Degrees are concentrated around the mean. Few very high or very low degree nodes. Found in: random graphs (synthetic). Rare in real-world data.

Regular

All nodes have the same degree. Found in: crystal structures, some lattice networks. Uncommon in real-world data.

Impact on GNN design

Degree normalization

Without normalization, a hub node with 100,000 neighbors produces an aggregated message 100,000 times larger than a leaf node with 1 neighbor. GCN addresses this with degree normalization: each message is weighted by 1/sqrt(deg_i x deg_j). This prevents hub nodes from dominating the representation.

Neighbor sampling

Power-law graphs make full-neighborhood aggregation impractical: a hub node with 1M neighbors cannot be fully processed. GraphSAGE's neighbor sampling fixes this: sample a fixed number of neighbors (e.g., 25) for each node. Hub nodes use 25 of their 1M neighbors. This bounds computation per node and is necessary for any graph with power-law degree.

Over-smoothing and the small-world property

Many real-world graphs have the small-world property: despite millions of nodes, the average shortest path is only 4-7 hops. In the Facebook social graph, any two users are connected by an average of 4.7 hops.

For GNNs, this means 2-3 layers of message passing already reach a large portion of the graph. By 5-6 layers, information from the entire graph has mixed into every node's embedding, causing over-smoothing. The small-world property (a consequence of degree distribution) explains why shallow GNNs work best in practice.

Degree as a feature

Node degree is itself an informative feature:

Hub identification: high-degree nodes are often structurally important (popular users, key accounts, major suppliers)
Periphery identification: low-degree nodes are leaves or peripheral entities
Positional encoding: degree provides coarse structural information that helps GNNs distinguish nodes with different topological roles
Anomaly detection: nodes with degree significantly different from their expected degree (given node type) may be anomalous

Many GNN architectures concatenate degree as an input feature or use it as a positional encoding. The Graph Isomorphism Network (GIN) effectively encodes degree through its sum aggregation.

Key Takeaways

1Degree distribution describes how connections vary across nodes. Real-world graphs follow power-law distributions: most nodes have few connections, a few hubs have millions.
2Power-law distributions require degree normalization (prevent hub domination) and neighbor sampling (bound computation per node). Without these, GNNs are impractical on real-world graphs.
3The sum vs mean aggregation choice depends on whether degree carries task-relevant signal. Sum preserves degree information; mean normalizes it away.
4The small-world property (short average path length in power-law graphs) explains why 2-3 GNN layers suffice and why over-smoothing occurs at shallow depths.
5Degree itself is a valuable feature: it identifies hubs, periphery nodes, and anomalies. Many GNN architectures use degree as input features or positional encodings.

Degree Distribution: How the Number of Connections Varies Across Nodes