Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

Negative Sampling: Generating Fake Edges for Contrastive Training

Negative sampling creates non-existent edges as training examples. The model learns to distinguish real connections from fake ones. The quality of negatives directly determines the quality of link prediction, recommendations, and knowledge graph completion.

PyTorch Geometric

TL;DR

  • 1Negative sampling generates fake edges (non-connected node pairs) as negative examples for link prediction training. Without negatives, the model would predict every pair as connected.
  • 2Three strategies: uniform random (simplest), degree-proportional (harder examples), hard negative mining (most confusing fakes). Hard negatives improve model quality significantly.
  • 3Typical negative-to-positive ratio: 1:1 for large graphs, 5:1 for small graphs. In-batch negatives (reusing other positives as negatives) are free and effective.
  • 4Negative sampling is essential for: link prediction, recommendation training, knowledge graph completion, graph autoencoder training, and contrastive self-supervised learning.
  • 5PyG provides negative_sampling() utility. For production quality, combine uniform sampling with hard negative mining using the model's own predictions.

Negative sampling generates fake edges for contrastive link prediction training. In link prediction, the model learns to score real edges higher than non-existent ones. But the graph only tells you what IS connected, not what IS NOT. You need to explicitly create examples of non-connected pairs (negatives) for the model to learn from. The quality of these negatives directly determines model quality.

This technique is fundamental to link prediction, recommendations, knowledge graph completion, and contrastive learning. Every system that learns to distinguish “should be connected” from “should not be connected” relies on negative sampling.

Basic negative sampling

negative_sampling.py
from torch_geometric.utils import negative_sampling
import torch

# Real edges (positive examples)
pos_edge_index = data.edge_index  # [2, num_edges]

# Generate negative edges: random pairs that are NOT connected
neg_edge_index = negative_sampling(
    edge_index=pos_edge_index,
    num_nodes=data.num_nodes,
    num_neg_samples=pos_edge_index.size(1),  # 1:1 ratio
)
# neg_edge_index: [2, num_neg_edges]

# Train: score positives higher than negatives
pos_scores = model.score(z[pos_edge_index[0]], z[pos_edge_index[1]])
neg_scores = model.score(z[neg_edge_index[0]], z[neg_edge_index[1]])

# Binary cross-entropy loss
pos_loss = -torch.log(torch.sigmoid(pos_scores)).mean()
neg_loss = -torch.log(1 - torch.sigmoid(neg_scores)).mean()
loss = pos_loss + neg_loss

PyG's negative_sampling() ensures generated pairs are not actual edges. The model learns to score real edges higher.

Sampling strategies

Uniform random

Pick any two unconnected nodes with equal probability. Simple, fast, but most negatives are “easy” (obviously disconnected nodes like a U.S. customer and a product only sold in Japan). The model learns little from easy negatives.

Degree-proportional

Sample negative nodes proportional to their degree. High-degree nodes appear more often as negative targets. This creates harder negatives because popular nodes are more plausible connections.

Hard negative mining

Use the model's own predictions to find hard negatives:

hard_negative_mining.py
def mine_hard_negatives(model, z, pos_edge_index, num_negatives=5):
    """Sample negatives that the model currently scores highly."""
    hard_negatives = []
    for i in range(pos_edge_index.size(1)):
        src = pos_edge_index[0, i]

        # Score many random candidates
        candidates = torch.randint(0, z.size(0), (100,))
        scores = model.score(z[src].unsqueeze(0), z[candidates])

        # Keep the highest-scored non-edges (hardest negatives)
        top_k = scores.topk(num_negatives).indices
        hard_negatives.append(candidates[top_k])

    return hard_negatives

# Hard negatives force the model to learn fine-grained distinctions
# Easy negatives waste gradient on trivially distinguishable pairs

Hard negative mining: find non-edges the model thinks are real. These are the most informative training examples.

Enterprise example: product recommendation training

Training a recommendation model on a user-product bipartite graph:

  • Positives: (user, product) pairs where the user purchased the product
  • Easy negatives: random products (user who buys electronics paired with baby products)
  • Hard negatives: products the user browsed but did not buy, or products bought by similar users that this user did not buy

Training only on easy negatives produces a model that distinguishes electronics buyers from baby product buyers (trivial) but cannot distinguish which specific laptop a user prefers (the actual recommendation problem). Hard negatives force the model to learn the fine-grained preferences that drive real purchase decisions.

In-batch negatives

A free and effective technique: in a batch of 256 positive (user, item) pairs, use every other item in the batch as a negative for each user. This gives 255 negatives per positive at zero additional sampling cost. In-batch negatives are naturally “semi-hard” because they are items that other users actually purchased (plausible but incorrect for this specific user).

Frequently asked questions

What is negative sampling in GNNs?

Negative sampling generates fake (non-existent) edges to serve as negative examples during link prediction training. The model learns to distinguish real edges (positive) from fake edges (negative). Without negatives, the model would learn to predict that every pair of nodes is connected, which is useless.

How does negative sampling work?

For each real edge (A, B), sample one or more node pairs that are NOT connected: (A, C) where C is a random node not connected to A. The model trains to score real edges higher than fake edges. The ratio of negatives to positives is typically 1:1 to 5:1.

What negative sampling strategies exist?

Uniform random (simplest: pick any disconnected pair), degree-proportional (prefer high-degree nodes as negatives, harder examples), hard negative mining (use the model's own predictions to find the most confusing negatives), and in-batch (use other positive edges in the batch as negatives). Hard negatives improve training quality significantly.

What are hard negatives and why do they matter?

Hard negatives are fake edges the model currently thinks are real. If the model gives (A, C) a high score even though A and C are not connected, (A, C) is a hard negative. Training on hard negatives forces the model to learn finer distinctions. Easy negatives (obviously disconnected nodes) waste training signal.

How many negatives should I sample per positive?

Typical ratios: 1:1 for large graphs (enough hard negatives naturally), 5:1 for small graphs (need more negatives for stable gradients), up to 50:1 for knowledge graph link prediction (huge entity space). More negatives increase training cost but improve discrimination. In-batch negatives (reusing positives from other samples) are free and effective.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.