Original Paper
Strategies for Pre-training Graph Neural Networks
Hu et al. (2019). ICLR 2020
Read paper →What GINEConv does
GINEConv modifies GINConv's aggregation to incorporate edge features before summing:
- For each neighbor j, combine its features with the edge features: ReLU(h_j + e_ij)
- Sum these edge-enhanced messages across all neighbors
- Add the node's own features (scaled by learnable epsilon)
- Pass through a multi-layer perceptron
This seemingly small change is critical for graphs where edges carry distinct information. In a molecular graph, the difference between a single bond and a double bond completely changes the molecule's properties. GINConv treats them identically; GINEConv does not.
The math (simplified)
# GINConv (ignores edges)
h_i' = MLP( (1 + eps) · h_i + Σ_j h_j )
# GINEConv (uses edge features)
h_i' = MLP( (1 + eps) · h_i + Σ_j ReLU(h_j + e_ij) )
Where:
e_ij = edge feature vector for edge (i, j)
ReLU = nonlinearity applied after combining node + edge
The edge features must have the same dimension as node features
(use a linear projection if they differ)Edge features are added to neighbor features before aggregation. The ReLU ensures the combination is nonlinear, preserving expressiveness.
PyG implementation
import torch
import torch.nn.functional as F
from torch_geometric.nn import GINEConv, global_add_pool
class GINE(torch.nn.Module):
def __init__(self, node_dim, edge_dim, hidden, out_channels, num_layers=5):
super().__init__()
self.edge_proj = torch.nn.Linear(edge_dim, hidden)
self.node_proj = torch.nn.Linear(node_dim, hidden)
self.convs = torch.nn.ModuleList()
for _ in range(num_layers):
mlp = torch.nn.Sequential(
torch.nn.Linear(hidden, hidden),
torch.nn.ReLU(),
torch.nn.Linear(hidden, hidden),
)
self.convs.append(GINEConv(mlp))
self.classifier = torch.nn.Linear(hidden, out_channels)
def forward(self, x, edge_index, edge_attr, batch):
x = self.node_proj(x)
edge_attr = self.edge_proj(edge_attr)
for conv in self.convs:
x = conv(x, edge_index, edge_attr)
x = F.relu(x)
x = global_add_pool(x, batch)
return self.classifier(x)
# Usage on molecular dataset
model = GINE(node_dim=9, edge_dim=3, hidden=64, out_channels=1)
# node features: atom type, degree, etc.
# edge features: bond type, stereochemistry, etc.Project both node and edge features to the same hidden dimension before passing to GINEConv. The edge_attr dimension must match the node feature dimension.
When to use GINEConv
- Molecular property prediction. Bond types (single, double, triple, aromatic), bond stereochemistry, and ring membership are critical features encoded on edges.
- Graph pre-training. GINEConv is the standard backbone for pre-training strategies that mask and predict both node and edge attributes.
- Knowledge graphs with typed relations. Relation types (e.g., “is-a”, “part-of”, “authored-by”) are naturally edge features.
- Transaction networks. Transaction amounts, currencies, and timestamps are edge features that distinguish otherwise identical connections.
When not to use GINEConv
- Graphs without edge features. If edges carry no attributes, use GINConv. Adding zero-valued edge features adds computation without benefit.
- When you need attention. GINEConv treats all neighbors equally (sum aggregation). If neighbor importance varies, consider TransformerConv with edge_attr or GATConv.