SGConv: Simplified Graph Convolution Explained | PyG Guide

Original Paper

Simplifying Graph Convolutional Networks

Wu et al. (2019). ICML 2019

What SGConv does

SGConv makes a provocative simplification:

Remove all ReLU activations between GCN layers
Collapse the resulting product of weight matrices into one: W = W_1 * W_2 * ... * W_K
Pre-compute the K-hop smoothed features: X_smooth = A^K * X
Apply a single linear transformation: output = X_smooth * W

The resulting model is logistic regression on K-hop averaged features. The graph structure is entirely captured in the pre-processing step.

PyG implementation

sgconv_model.py

import torch
from torch_geometric.nn import SGConv

class SGC(torch.nn.Module):
    def __init__(self, in_channels, out_channels, K=2):
        super().__init__()
        self.conv = SGConv(in_channels, out_channels, K=K)

    def forward(self, x, edge_index):
        return self.conv(x, edge_index)

# That's it. One layer. Linear model.
model = SGC(dataset.num_features, dataset.num_classes, K=2)

# Training is just logistic regression
optimizer = torch.optim.Adam(model.parameters(), lr=0.2)
for epoch in range(100):
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = torch.nn.functional.cross_entropy(
        out[data.train_mask], data.y[data.train_mask]
    )
    loss.backward()
    optimizer.step()

The entire model is one SGConv layer. Training converges fast because it is a linear model. On Cora, this achieves ~81% accuracy, matching 2-layer GCN.

When to use SGConv

Fast baselines. SGConv establishes a strong baseline in seconds. If it already achieves your accuracy target, you may not need a complex GNN at all.
Very large graphs. Pre-compute A^K * X once, then train a linear model. This scales to millions of nodes where multi-layer GCN is too slow.
Homophilic graphs. When neighbors tend to share labels (citation networks, co-purchase graphs), smoothing features across neighbors is exactly the right inductive bias.
Production latency requirements. Inference is a single matrix multiplication after pre-computing smoothed features.

When not to use SGConv

Heterophilic graphs. When neighbors have different labels, smoothing destroys the signal you need.
Complex tasks requiring nonlinear features. Graph classification, link prediction, and tasks needing complex neighborhood patterns benefit from nonlinear layers.

Frequently asked questions

What is SGConv in PyTorch Geometric?

SGConv implements the Simplified Graph Convolution from Wu et al. (2019). It removes all nonlinearities between GCN layers and collapses the entire model into a single linear transformation followed by K-hop propagation. The result is a linear model that is orders of magnitude faster than multi-layer GCN while achieving comparable accuracy.

How does SGConv simplify GCN?

Multi-layer GCN applies a weight matrix and ReLU at each layer: H^(k) = ReLU(A*H^(k-1)*W_k). SGC removes the ReLU and collapses all weight matrices into one: H = A^K * X * W. This means the entire model is a single linear transformation applied to K-hop smoothed features.

Is SGConv just a linear model?

Yes. SGConv is a linear classifier on K-hop averaged features. The graph structure is used only for feature smoothing (pre-processing), and the model itself is logistic regression. This simplicity makes it extremely fast while maintaining competitive accuracy on many benchmarks.

When does SGConv fail compared to GCN?

SGConv fails when the task requires nonlinear interactions between features at different hops. On heterophilic graphs (where neighbors have different labels), the smoothing hurts because it mixes conflicting signals. On graphs needing complex neighborhood patterns, nonlinear layers provide necessary expressiveness.

How fast is SGConv compared to GCN?

SGConv training is typically 10-100x faster than multi-layer GCN because the graph propagation (A^K * X) can be pre-computed once before training. After pre-computation, the model is logistic regression on the smoothed features. The pre-computation itself takes O(K * |E|) time.

SGConv: When a Linear Model Is All You Need