What is negative sampling in GNNs?

Negative sampling generates fake (non-existent) edges to serve as negative examples during link prediction training. The model learns to distinguish real edges (positive) from fake edges (negative). Without negatives, the model would learn to predict that every pair of nodes is connected, which is useless.

How does negative sampling work?

For each real edge (A, B), sample one or more node pairs that are NOT connected: (A, C) where C is a random node not connected to A. The model trains to score real edges higher than fake edges. The ratio of negatives to positives is typically 1:1 to 5:1.

What negative sampling strategies exist?

Uniform random (simplest: pick any disconnected pair), degree-proportional (prefer high-degree nodes as negatives, harder examples), hard negative mining (use the model's own predictions to find the most confusing negatives), and in-batch (use other positive edges in the batch as negatives). Hard negatives improve training quality significantly.

What are hard negatives and why do they matter?

Hard negatives are fake edges the model currently thinks are real. If the model gives (A, C) a high score even though A and C are not connected, (A, C) is a hard negative. Training on hard negatives forces the model to learn finer distinctions. Easy negatives (obviously disconnected nodes) waste training signal.

How many negatives should I sample per positive?

Typical ratios: 1:1 for large graphs (enough hard negatives naturally), 5:1 for small graphs (need more negatives for stable gradients), up to 50:1 for knowledge graph link prediction (huge entity space). More negatives increase training cost but improve discrimination. In-batch negatives (reusing positives from other samples) are free and effective.

Negative Sampling for GNNs: Generating Fake Edges for Training | Kumo.ai

Negative sampling generates fake edges for contrastive link prediction training. In link prediction, the model learns to score real edges higher than non-existent ones. But the graph only tells you what IS connected, not what IS NOT. You need to explicitly create examples of non-connected pairs (negatives) for the model to learn from. The quality of these negatives directly determines model quality.

This technique is fundamental to link prediction, recommendations, knowledge graph completion, and contrastive learning. Every system that learns to distinguish “should be connected” from “should not be connected” relies on negative sampling.

Basic negative sampling

negative_sampling.py

from torch_geometric.utils import negative_sampling
import torch

# Real edges (positive examples)
pos_edge_index = data.edge_index  # [2, num_edges]

# Generate negative edges: random pairs that are NOT connected
neg_edge_index = negative_sampling(
    edge_index=pos_edge_index,
    num_nodes=data.num_nodes,
    num_neg_samples=pos_edge_index.size(1),  # 1:1 ratio
)
# neg_edge_index: [2, num_neg_edges]

# Train: score positives higher than negatives
pos_scores = model.score(z[pos_edge_index[0]], z[pos_edge_index[1]])
neg_scores = model.score(z[neg_edge_index[0]], z[neg_edge_index[1]])

# Binary cross-entropy loss
pos_loss = -torch.log(torch.sigmoid(pos_scores)).mean()
neg_loss = -torch.log(1 - torch.sigmoid(neg_scores)).mean()
loss = pos_loss + neg_loss

PyG's negative_sampling() ensures generated pairs are not actual edges. The model learns to score real edges higher.

Sampling strategies

Uniform random

Pick any two unconnected nodes with equal probability. Simple, fast, but most negatives are “easy” (obviously disconnected nodes like a U.S. customer and a product only sold in Japan). The model learns little from easy negatives.

Degree-proportional

Sample negative nodes proportional to their degree. High-degree nodes appear more often as negative targets. This creates harder negatives because popular nodes are more plausible connections.

Hard negative mining

Use the model's own predictions to find hard negatives:

hard_negative_mining.py

def mine_hard_negatives(model, z, pos_edge_index, num_negatives=5):
    """Sample negatives that the model currently scores highly."""
    hard_negatives = []
    for i in range(pos_edge_index.size(1)):
        src = pos_edge_index[0, i]

        # Score many random candidates
        candidates = torch.randint(0, z.size(0), (100,))
        scores = model.score(z[src].unsqueeze(0), z[candidates])

        # Keep the highest-scored non-edges (hardest negatives)
        top_k = scores.topk(num_negatives).indices
        hard_negatives.append(candidates[top_k])

    return hard_negatives

# Hard negatives force the model to learn fine-grained distinctions
# Easy negatives waste gradient on trivially distinguishable pairs

Hard negative mining: find non-edges the model thinks are real. These are the most informative training examples.

Enterprise example: product recommendation training

Training a recommendation model on a user-product bipartite graph:

Positives: (user, product) pairs where the user purchased the product
Easy negatives: random products (user who buys electronics paired with baby products)
Hard negatives: products the user browsed but did not buy, or products bought by similar users that this user did not buy

Training only on easy negatives produces a model that distinguishes electronics buyers from baby product buyers (trivial) but cannot distinguish which specific laptop a user prefers (the actual recommendation problem). Hard negatives force the model to learn the fine-grained preferences that drive real purchase decisions.

In-batch negatives

A free and effective technique: in a batch of 256 positive (user, item) pairs, use every other item in the batch as a negative for each user. This gives 255 negatives per positive at zero additional sampling cost. In-batch negatives are naturally “semi-hard” because they are items that other users actually purchased (plausible but incorrect for this specific user).

Key Takeaways

1Negative sampling creates non-existent edges as training examples for link prediction. Without negatives, models learn to predict everything as connected.
2Three strategies: uniform random (fast, easy negatives), degree-proportional (moderate difficulty), hard negative mining (use model predictions to find the most confusing fakes). Hard negatives improve quality most.
3Negative quality matters more than quantity. 5 hard negatives teach more than 50 easy negatives. In-batch negatives provide moderate-difficulty examples for free.
4False negatives (missing edges labeled as negative) are a risk in sparse or incomplete graphs. Score thresholding and soft labels help mitigate this.
5PyG provides negative_sampling() utility. For production, combine with hard negative mining and in-batch negatives. This is essential for link prediction, recommendations, and knowledge graph completion.

Negative Sampling: Generating Fake Edges for Contrastive Training