Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Curriculum Learning: Training GNNs on Progressively Harder Examples

Humans learn arithmetic before calculus. Curriculum learning applies the same principle to GNNs: start with easy graph patterns and build toward complex ones. The result is faster convergence and better final accuracy.

PyTorch Geometric

TL;DR

  • 1Curriculum learning orders training examples from easy to hard. For GNNs, difficulty is measured by neighborhood structure, label confidence, or subgraph complexity.
  • 2Starting with easy examples lets the model build a stable feature space before encountering ambiguous or noisy patterns. This prevents early training instability.
  • 3Three pacing strategies: linear (increase difficulty steadily), step (discrete jumps), and self-paced (the model itself decides what is easy based on its current loss).
  • 4Most effective on heterogeneous graphs with diverse node types and varying neighborhood complexity, and on tasks with heavy class imbalance like fraud detection.
  • 5Typical improvements: 1-3% accuracy gain and 20-40% faster convergence compared to random-order training, with the gap widening on noisier datasets.

Curriculum learning trains graph neural networks by presenting examples in order of increasing difficulty. Instead of shuffling all training nodes randomly in each epoch, the model first learns from nodes with clear, unambiguous patterns and progressively tackles harder cases. This produces a more stable optimization trajectory and typically results in both faster convergence and higher final accuracy.

Defining difficulty in graphs

The first challenge is measuring how “hard” each training example is. Several graph-specific difficulty metrics work well:

  • Neighborhood homophily: Nodes whose neighbors share the same label are easy. Nodes surrounded by different-label neighbors are hard. In a fraud graph, a fraudulent node surrounded by other flagged accounts is easy; one embedded in mostly legitimate accounts is hard.
  • Pre-trained confidence: Train a quick baseline model, then rank nodes by prediction confidence. Low-confidence nodes are harder.
  • Node degree: Very low-degree nodes have insufficient context (hard), and very high-degree nodes have noisy aggregation (also hard). Medium-degree nodes are easiest.
  • Subgraph complexity: Nodes in dense, diverse neighborhoods with multiple edge types are harder than nodes in simple star patterns.

Pacing functions

The pacing function controls how quickly the model transitions from easy to hard examples:

curriculum_pacing.py
import numpy as np

def linear_pacing(epoch, max_epochs, num_nodes):
    """Linearly increase the fraction of training data."""
    fraction = min(1.0, 0.2 + 0.8 * epoch / max_epochs)
    return int(fraction * num_nodes)

def step_pacing(epoch, max_epochs, num_nodes, steps=4):
    """Discrete jumps: 25%, 50%, 75%, 100% of data."""
    step = min(steps, 1 + epoch * steps // max_epochs)
    return int(step / steps * num_nodes)

def self_paced(losses, threshold):
    """Include only examples with loss below a threshold."""
    return (losses < threshold).nonzero().squeeze()

# Usage in training loop
sorted_indices = difficulty_scores.argsort()  # easy first
for epoch in range(max_epochs):
    n = linear_pacing(epoch, max_epochs, len(sorted_indices))
    train_nodes = sorted_indices[:n]
    train_one_epoch(model, data, train_nodes)

Three pacing strategies. Self-paced learning lets the model itself define difficulty based on current loss.

Why it works for graph neural networks

GNNs are particularly sensitive to training order because of neighborhood aggregation. When a node computes its representation, it depends on its neighbors' representations, which depend ontheir neighbors. A noisy or misclassified node early in training can corrupt representations multiple hops away.

Curriculum learning mitigates this by ensuring the model builds reliable representations in easy neighborhoods first. When hard nodes are finally introduced, their neighbors already have stable representations, providing a better signal for learning the difficult patterns.

Curriculum learning for class-imbalanced graphs

Class imbalance is endemic in enterprise graph tasks. Fraud rates below 1%, churn rates at 5-10%, and rare product categories all create imbalanced distributions. Curriculum learning offers a natural solution:

  1. Start with a balanced subset of easy examples from both classes
  2. Gradually introduce harder examples, maintaining class balance
  3. In the final phase, expose the model to the full imbalanced distribution

This approach builds a robust decision boundary on easy cases before challenging it with ambiguous, imbalanced data. It often outperforms static oversampling or loss weighting alone.

Practical guidelines

  • Pre-compute difficulty scores: Run a fast baseline (2-layer GCN, 10 epochs) and use its per-node loss or confidence as the difficulty metric. This adds minutes to pipeline setup but saves hours of training.
  • Start with 20% of data: Exposing too little data initially causes underfitting. 20% is a good starting fraction for most graph datasets.
  • Linear pacing is a safe default: It requires no additional hyperparameters beyond the starting fraction and works consistently across datasets.
  • Combine with temporal ordering: For temporal graphs, use time as a natural curriculum. Older, well-established patterns are easier; recent, evolving patterns are harder.

Frequently asked questions

What is curriculum learning for graph neural networks?

Curriculum learning trains a GNN by presenting examples in order of difficulty, starting with easy samples and progressively introducing harder ones. For graphs, difficulty can be defined by node degree, neighborhood complexity, label noise, or prediction confidence from a pre-trained model.

How do you define difficulty for graph data?

Several metrics work: (1) Node degree: low-degree nodes have less context and are harder. (2) Class confidence: examples the model is uncertain about are harder. (3) Neighborhood homophily: nodes whose neighbors have different labels are harder. (4) Subgraph complexity: nodes in dense, heterogeneous subgraphs are harder than those in simple star patterns.

Does curriculum learning always help GNNs?

Not always. It helps most when there is high variance in example difficulty, such as graphs with diverse neighborhood structures or heavy class imbalance. On homogeneous graphs with similar node neighborhoods, the benefit is marginal. It also adds a hyperparameter (pacing function) that requires tuning.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.