Dropout on graphs randomly drops edges or features during training to prevent overfitting. Standard feature dropout, familiar from any neural network, zeros out random feature dimensions. Graph-specific dropout goes further: DropEdge removes random connections, and DropNode removes entire nodes from the computation. These structural modifications force the GNN to learn patterns that do not depend on any single edge, node, or feature.
This is particularly important for GNNs because overfitting to graph structure is a distinct failure mode. A model might memorize that node A is always connected to node B and rely on that specific edge rather than learning the general pattern. DropEdge breaks this dependency.
Feature dropout (standard)
Apply between GNN layers, exactly like in MLPs or CNNs:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class GCN(torch.nn.Module):
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
self.conv1 = GCNConv(in_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, out_dim)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
x = F.dropout(x, p=0.5, training=self.training) # feature dropout
x = self.conv2(x, edge_index)
return x
# During training: 50% of feature dimensions randomly zeroed
# During inference: all features used, scaled by (1 - p)Standard feature dropout between GNN layers. The self.training flag ensures dropout is disabled at inference.
DropEdge
DropEdge randomly removes a fraction of edges before each message passing step:
from torch_geometric.utils import dropout_edge
class GCNWithDropEdge(torch.nn.Module):
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
self.conv1 = GCNConv(in_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, out_dim)
def forward(self, x, edge_index):
if self.training:
# Drop 20% of edges randomly
edge_index_1, _ = dropout_edge(edge_index, p=0.2)
edge_index_2, _ = dropout_edge(edge_index, p=0.2)
else:
edge_index_1 = edge_index_2 = edge_index
x = self.conv1(x, edge_index_1).relu()
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index_2)
return xEach layer gets a different random edge mask. The graph structure changes every training step, preventing memorization.
DropNode
DropNode removes entire nodes and all their edges. This is more aggressive than DropEdge because removing a high-degree node can significantly change the local graph structure:
- Feature dropout: subtle, affects individual dimensions
- DropEdge: moderate, removes specific connections
- DropNode: aggressive, removes entire entities
Use DropNode at low rates (5-10%) when you want the model to be robust to missing entities, which is common in production where data can be incomplete.
Enterprise example: robust fraud detection
A fraud detection model trained without dropout might learn to rely on a specific highly-connected merchant node that happens to be associated with fraud in the training data. If that merchant changes behavior or is removed, the model breaks.
With DropEdge and DropNode:
- DropEdge forces the model to detect fraud patterns even when some transaction edges are missing
- DropNode forces the model to identify fraud rings even when some accounts are absent
- Feature dropout prevents over-reliance on any single account attribute
The resulting model generalizes to new fraud patterns that emerge after training, because it learned structural patterns rather than memorizing specific nodes and edges.
Recommended dropout recipe
- Feature dropout: 0.5 between layers (standard)
- DropEdge: 0.2 per layer (moderate regularization)
- Attention dropout: 0.1 in graph transformers (prevents attention collapse)
- DropNode: 0.05-0.1 only if robustness to missing data is important