Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Layer8 min read

SGConv: When a Linear Model Is All You Need

SGConv strips GCN down to its essentials: remove the nonlinearities, collapse the weight matrices, and what remains is a linear classifier on smoothed features. It is 10-100x faster than GCN and surprisingly competitive. Here is when simplicity wins.

PyTorch Geometric

TL;DR

  • 1SGConv removes all nonlinearities from multi-layer GCN and collapses it into: output = softmax(A^K * X * W). One weight matrix, no activation functions.
  • 210-100x faster than multi-layer GCN because graph propagation (A^K * X) can be pre-computed once. The model is logistic regression on smoothed features.
  • 3Surprisingly competitive on homophilic graphs (Cora, CiteSeer, PubMed). The smoothing alone captures most of the signal for node classification.
  • 4Fails on heterophilic graphs where neighbors have different labels. The smoothing mixes conflicting signals, destroying discriminative information.

Original Paper

Simplifying Graph Convolutional Networks

Wu et al. (2019). ICML 2019

Read paper →

What SGConv does

SGConv makes a provocative simplification:

  1. Remove all ReLU activations between GCN layers
  2. Collapse the resulting product of weight matrices into one: W = W_1 * W_2 * ... * W_K
  3. Pre-compute the K-hop smoothed features: X_smooth = A^K * X
  4. Apply a single linear transformation: output = X_smooth * W

The resulting model is logistic regression on K-hop averaged features. The graph structure is entirely captured in the pre-processing step.

PyG implementation

sgconv_model.py
import torch
from torch_geometric.nn import SGConv

class SGC(torch.nn.Module):
    def __init__(self, in_channels, out_channels, K=2):
        super().__init__()
        self.conv = SGConv(in_channels, out_channels, K=K)

    def forward(self, x, edge_index):
        return self.conv(x, edge_index)

# That's it. One layer. Linear model.
model = SGC(dataset.num_features, dataset.num_classes, K=2)

# Training is just logistic regression
optimizer = torch.optim.Adam(model.parameters(), lr=0.2)
for epoch in range(100):
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = torch.nn.functional.cross_entropy(
        out[data.train_mask], data.y[data.train_mask]
    )
    loss.backward()
    optimizer.step()

The entire model is one SGConv layer. Training converges fast because it is a linear model. On Cora, this achieves ~81% accuracy, matching 2-layer GCN.

When to use SGConv

  • Fast baselines. SGConv establishes a strong baseline in seconds. If it already achieves your accuracy target, you may not need a complex GNN at all.
  • Very large graphs. Pre-compute A^K * X once, then train a linear model. This scales to millions of nodes where multi-layer GCN is too slow.
  • Homophilic graphs. When neighbors tend to share labels (citation networks, co-purchase graphs), smoothing features across neighbors is exactly the right inductive bias.
  • Production latency requirements. Inference is a single matrix multiplication after pre-computing smoothed features.

When not to use SGConv

  • Heterophilic graphs. When neighbors have different labels, smoothing destroys the signal you need.
  • Complex tasks requiring nonlinear features. Graph classification, link prediction, and tasks needing complex neighborhood patterns benefit from nonlinear layers.

Frequently asked questions

What is SGConv in PyTorch Geometric?

SGConv implements the Simplified Graph Convolution from Wu et al. (2019). It removes all nonlinearities between GCN layers and collapses the entire model into a single linear transformation followed by K-hop propagation. The result is a linear model that is orders of magnitude faster than multi-layer GCN while achieving comparable accuracy.

How does SGConv simplify GCN?

Multi-layer GCN applies a weight matrix and ReLU at each layer: H^(k) = ReLU(A*H^(k-1)*W_k). SGC removes the ReLU and collapses all weight matrices into one: H = A^K * X * W. This means the entire model is a single linear transformation applied to K-hop smoothed features.

Is SGConv just a linear model?

Yes. SGConv is a linear classifier on K-hop averaged features. The graph structure is used only for feature smoothing (pre-processing), and the model itself is logistic regression. This simplicity makes it extremely fast while maintaining competitive accuracy on many benchmarks.

When does SGConv fail compared to GCN?

SGConv fails when the task requires nonlinear interactions between features at different hops. On heterophilic graphs (where neighbors have different labels), the smoothing hurts because it mixes conflicting signals. On graphs needing complex neighborhood patterns, nonlinear layers provide necessary expressiveness.

How fast is SGConv compared to GCN?

SGConv training is typically 10-100x faster than multi-layer GCN because the graph propagation (A^K * X) can be pre-computed once before training. After pre-computation, the model is logistic regression on the smoothed features. The pre-computation itself takes O(K * |E|) time.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.