Negative sampling generates fake edges for contrastive link prediction training. In link prediction, the model learns to score real edges higher than non-existent ones. But the graph only tells you what IS connected, not what IS NOT. You need to explicitly create examples of non-connected pairs (negatives) for the model to learn from. The quality of these negatives directly determines model quality.
This technique is fundamental to link prediction, recommendations, knowledge graph completion, and contrastive learning. Every system that learns to distinguish “should be connected” from “should not be connected” relies on negative sampling.
Basic negative sampling
from torch_geometric.utils import negative_sampling
import torch
# Real edges (positive examples)
pos_edge_index = data.edge_index # [2, num_edges]
# Generate negative edges: random pairs that are NOT connected
neg_edge_index = negative_sampling(
edge_index=pos_edge_index,
num_nodes=data.num_nodes,
num_neg_samples=pos_edge_index.size(1), # 1:1 ratio
)
# neg_edge_index: [2, num_neg_edges]
# Train: score positives higher than negatives
pos_scores = model.score(z[pos_edge_index[0]], z[pos_edge_index[1]])
neg_scores = model.score(z[neg_edge_index[0]], z[neg_edge_index[1]])
# Binary cross-entropy loss
pos_loss = -torch.log(torch.sigmoid(pos_scores)).mean()
neg_loss = -torch.log(1 - torch.sigmoid(neg_scores)).mean()
loss = pos_loss + neg_lossPyG's negative_sampling() ensures generated pairs are not actual edges. The model learns to score real edges higher.
Sampling strategies
Uniform random
Pick any two unconnected nodes with equal probability. Simple, fast, but most negatives are “easy” (obviously disconnected nodes like a U.S. customer and a product only sold in Japan). The model learns little from easy negatives.
Degree-proportional
Sample negative nodes proportional to their degree. High-degree nodes appear more often as negative targets. This creates harder negatives because popular nodes are more plausible connections.
Hard negative mining
Use the model's own predictions to find hard negatives:
def mine_hard_negatives(model, z, pos_edge_index, num_negatives=5):
"""Sample negatives that the model currently scores highly."""
hard_negatives = []
for i in range(pos_edge_index.size(1)):
src = pos_edge_index[0, i]
# Score many random candidates
candidates = torch.randint(0, z.size(0), (100,))
scores = model.score(z[src].unsqueeze(0), z[candidates])
# Keep the highest-scored non-edges (hardest negatives)
top_k = scores.topk(num_negatives).indices
hard_negatives.append(candidates[top_k])
return hard_negatives
# Hard negatives force the model to learn fine-grained distinctions
# Easy negatives waste gradient on trivially distinguishable pairsHard negative mining: find non-edges the model thinks are real. These are the most informative training examples.
Enterprise example: product recommendation training
Training a recommendation model on a user-product bipartite graph:
- Positives: (user, product) pairs where the user purchased the product
- Easy negatives: random products (user who buys electronics paired with baby products)
- Hard negatives: products the user browsed but did not buy, or products bought by similar users that this user did not buy
Training only on easy negatives produces a model that distinguishes electronics buyers from baby product buyers (trivial) but cannot distinguish which specific laptop a user prefers (the actual recommendation problem). Hard negatives force the model to learn the fine-grained preferences that drive real purchase decisions.
In-batch negatives
A free and effective technique: in a batch of 256 positive (user, item) pairs, use every other item in the batch as a negative for each user. This gives 255 negatives per positive at zero additional sampling cost. In-batch negatives are naturally “semi-hard” because they are items that other users actually purchased (plausible but incorrect for this specific user).