Graph augmentation creates modified graph views by applying random transformations to the structure, features, or nodes. In image augmentation, you rotate, crop, or color-jitter an image, and the content stays the same. In graph augmentation, you drop edges, mask features, or remove nodes, creating a perturbed view that preserves essential structural properties while introducing variation.
These augmented views serve two purposes: they provide the positive pairs for contrastive self-supervised learning (where the model learns to match two views of the same graph), and they provide regularization during supervised training (similar to how dropout prevents overfitting by randomly perturbing the network).
Augmentation strategies
import torch
from torch_geometric.utils import dropout_edge, to_undirected
def edge_dropping(edge_index, drop_rate=0.2):
"""Remove random edges. Most common augmentation."""
edge_index, mask = dropout_edge(edge_index, p=drop_rate)
return edge_index
def feature_masking(x, mask_rate=0.2):
"""Zero out random feature dimensions."""
mask = torch.rand(x.size(1)) > mask_rate
return x * mask.float().unsqueeze(0)
def node_dropping(x, edge_index, drop_rate=0.1):
"""Remove random nodes and their edges."""
keep_mask = torch.rand(x.size(0)) > drop_rate
keep_idx = keep_mask.nonzero().squeeze()
# Remap edges to new node indices
new_x = x[keep_idx]
# ... remap edge_index to only include kept nodes
return new_x, new_edge_index
def subgraph_sampling(edge_index, num_nodes, ratio=0.8):
"""Sample a connected subgraph containing ratio% of nodes."""
# Random walk starting from random seed node
# Keep visited nodes and their induced edges
pass # implementation varies
# Compose multiple augmentations
def augment(x, edge_index):
edge_index = edge_dropping(edge_index, 0.2)
x = feature_masking(x, 0.15)
return x, edge_indexFour augmentation strategies. Composing multiple augmentations (edge drop + feature mask) generally works better than any single one.
Which augmentation for which domain
- Social networks: edge dropping works well because friendships are redundant (removing one connection among many has low information loss).
- Molecular graphs: be careful with edge dropping since every bond is structurally important. Feature masking is safer. Subgraph sampling preserves local chemistry.
- Knowledge graphs: edge dropping is acceptable because knowledge graphs are inherently incomplete. Feature masking is less applicable since entities often have sparse features.
- Enterprise graphs: moderate edge dropping (10-20%) with feature masking (15-25%) is a good starting point. The redundancy in large graphs makes augmentation generally safe.
Enterprise example: robust transaction embeddings
A bank wants transaction graph embeddings that are robust to data quality issues: missing transactions (dropped edges), incomplete account attributes (masked features), and accounts closed after training (dropped nodes).
By training with augmentation that simulates these real-world data issues:
- Edge dropping simulates missing transactions
- Feature masking simulates incomplete records
- Node dropping simulates closed or removed accounts
The resulting model produces embeddings that are stable under production data quality conditions, not just clean training conditions.
Adaptive augmentation (GCA)
Uniform random augmentation treats all edges and features equally. But some edges are critical (a bridge connecting two communities) and some are redundant (one of 100 connections in a dense cluster). GCA learns to:
- Preserve high-centrality edges (bridges, connectors)
- Drop low-centrality edges (redundant within clusters)
- Preserve high-variance features (informative)
- Mask low-variance features (redundant)