What SGConv does
SGConv makes a provocative simplification:
- Remove all ReLU activations between GCN layers
- Collapse the resulting product of weight matrices into one: W = W_1 * W_2 * ... * W_K
- Pre-compute the K-hop smoothed features: X_smooth = A^K * X
- Apply a single linear transformation: output = X_smooth * W
The resulting model is logistic regression on K-hop averaged features. The graph structure is entirely captured in the pre-processing step.
PyG implementation
import torch
from torch_geometric.nn import SGConv
class SGC(torch.nn.Module):
def __init__(self, in_channels, out_channels, K=2):
super().__init__()
self.conv = SGConv(in_channels, out_channels, K=K)
def forward(self, x, edge_index):
return self.conv(x, edge_index)
# That's it. One layer. Linear model.
model = SGC(dataset.num_features, dataset.num_classes, K=2)
# Training is just logistic regression
optimizer = torch.optim.Adam(model.parameters(), lr=0.2)
for epoch in range(100):
optimizer.zero_grad()
out = model(data.x, data.edge_index)
loss = torch.nn.functional.cross_entropy(
out[data.train_mask], data.y[data.train_mask]
)
loss.backward()
optimizer.step()The entire model is one SGConv layer. Training converges fast because it is a linear model. On Cora, this achieves ~81% accuracy, matching 2-layer GCN.
When to use SGConv
- Fast baselines. SGConv establishes a strong baseline in seconds. If it already achieves your accuracy target, you may not need a complex GNN at all.
- Very large graphs. Pre-compute A^K * X once, then train a linear model. This scales to millions of nodes where multi-layer GCN is too slow.
- Homophilic graphs. When neighbors tend to share labels (citation networks, co-purchase graphs), smoothing features across neighbors is exactly the right inductive bias.
- Production latency requirements. Inference is a single matrix multiplication after pre-computing smoothed features.
When not to use SGConv
- Heterophilic graphs. When neighbors have different labels, smoothing destroys the signal you need.
- Complex tasks requiring nonlinear features. Graph classification, link prediction, and tasks needing complex neighborhood patterns benefit from nonlinear layers.