Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Degree Distribution: How the Number of Connections Varies Across Nodes

In a social network, most users have a few hundred connections. A handful of celebrities have millions. This power-law degree distribution shapes every aspect of GNN design: normalization, sampling, batching, and depth.

PyTorch Geometric

TL;DR

  • 1Degree is the number of edges connected to a node. The degree distribution P(k) describes how degrees vary across the graph. Real-world graphs almost always have skewed distributions.
  • 2Most real-world graphs follow power-law distributions: P(k) ~ k^(-gamma). Most nodes have few connections; a few hub nodes have orders of magnitude more. Social networks, the web, and citation networks all exhibit this pattern.
  • 3Power-law distributions create GNN challenges: hub nodes over-aggregate (too many messages), low-degree nodes under-aggregate (too few messages). Degree normalization and neighbor sampling address both extremes.
  • 4The small-world property (short average path length) means 2-3 GNN layers reach a significant fraction of nodes. This is why over-smoothing occurs at shallow depths.
  • 5Degree itself is an important feature: high-degree nodes play different structural roles (hubs, authorities) than low-degree nodes (leaves, periphery). Many GNNs use degree as a positional encoding.

The degree distribution is one of the most fundamental properties of any graph, and it directly shapes how GNNs process that graph. A node's degree (its number of connections) determines how many messages it receives during message passing, how much memory its neighborhood requires, and what structural role it plays. The distribution of degrees across all nodes determines whether the graph is easy or hard to partition, whether neighbor sampling is necessary, and how quickly over-smoothing occurs.

Common degree distributions

Power-law (scale-free)

P(k) is proportional to k^(-gamma). Most nodes have degree 1-10. A few hub nodes have degree 10,000-1,000,000. Found in: social networks, the web, citation networks, financial transaction networks.

The “scale-free” name comes from the distribution's self-similarity: zooming into any degree range reveals the same shape. There is no characteristic scale.

Exponential

P(k) decreases exponentially. Degrees are more uniform than power-law but still skewed. Found in: road networks, power grids, some biological networks.

Poisson (Erdos-Renyi random graph)

Degrees are concentrated around the mean. Few very high or very low degree nodes. Found in: random graphs (synthetic). Rare in real-world data.

Regular

All nodes have the same degree. Found in: crystal structures, some lattice networks. Uncommon in real-world data.

Impact on GNN design

Degree normalization

Without normalization, a hub node with 100,000 neighbors produces an aggregated message 100,000 times larger than a leaf node with 1 neighbor. GCN addresses this with degree normalization: each message is weighted by 1/sqrt(deg_i x deg_j). This prevents hub nodes from dominating the representation.

Neighbor sampling

Power-law graphs make full-neighborhood aggregation impractical: a hub node with 1M neighbors cannot be fully processed. GraphSAGE's neighbor sampling fixes this: sample a fixed number of neighbors (e.g., 25) for each node. Hub nodes use 25 of their 1M neighbors. This bounds computation per node and is necessary for any graph with power-law degree.

Over-smoothing and the small-world property

Many real-world graphs have the small-world property: despite millions of nodes, the average shortest path is only 4-7 hops. In the Facebook social graph, any two users are connected by an average of 4.7 hops.

For GNNs, this means 2-3 layers of message passing already reach a large portion of the graph. By 5-6 layers, information from the entire graph has mixed into every node's embedding, causing over-smoothing. The small-world property (a consequence of degree distribution) explains why shallow GNNs work best in practice.

Degree as a feature

Node degree is itself an informative feature:

  • Hub identification: high-degree nodes are often structurally important (popular users, key accounts, major suppliers)
  • Periphery identification: low-degree nodes are leaves or peripheral entities
  • Positional encoding: degree provides coarse structural information that helps GNNs distinguish nodes with different topological roles
  • Anomaly detection: nodes with degree significantly different from their expected degree (given node type) may be anomalous

Many GNN architectures concatenate degree as an input feature or use it as a positional encoding. The Graph Isomorphism Network (GIN) effectively encodes degree through its sum aggregation.

Frequently asked questions

What is degree distribution?

The degree of a node is the number of edges connected to it. The degree distribution P(k) gives the probability that a randomly selected node has degree k. In a social network, degree is the number of friends. In a citation network, degree is the number of citations. The distribution shape (Poisson, power-law, uniform) determines many graph properties and affects GNN design.

What is a power-law (scale-free) degree distribution?

A power-law degree distribution has P(k) proportional to k^(-gamma), typically with gamma between 2 and 3. This means most nodes have few connections, but a small number of 'hub' nodes have enormous degree. Social networks, the web, and citation networks follow power laws. The tail of the distribution is 'heavy': hub nodes can have millions of connections.

How does degree distribution affect GNN performance?

High-degree hub nodes aggregate messages from many neighbors, producing over-averaged (smooth) embeddings. Low-degree nodes aggregate from few neighbors, potentially missing important signals. Degree normalization (GCN) and neighbor sampling (GraphSAGE) address these extremes. Without these techniques, hub nodes dominate the learned representations.

What is the small-world property?

Many real-world graphs have the small-world property: despite being large (millions of nodes), the average shortest path between any two nodes is surprisingly small (typically 4-7 hops). This means 2-3 GNN layers can propagate information across a significant fraction of the graph. It also means over-smoothing is a real risk: with small diameter, even 3-4 layers can reach most of the graph.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.