A graph autoencoder encodes graph structure into a latent space, then decodes to reconstruct the original edges. The encoder is a GNN that maps each node to a low-dimensional vector. The decoder predicts whether an edge exists between two nodes based on the similarity of their vectors. Nodes that are connected in the original graph should have similar embeddings. This is unsupervised: no task labels are needed.
Graph autoencoders serve two purposes: producing high-quality node embeddings for downstream tasks, and performing link prediction to discover missing connections. Both are critical for enterprise applications where the graph is always incomplete.
Architecture
A graph autoencoder has two components:
- Encoder: a GNN (typically 2-layer GCN) that maps each node's features and neighborhood to a latent vector z_i.
- Decoder: predicts edge probability as sigmoid(z_i · z_j). Nodes with similar embeddings are predicted to be connected.
import torch
from torch_geometric.nn import GCNConv, GAE
class GCNEncoder(torch.nn.Module):
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
self.conv1 = GCNConv(in_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, out_dim)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
return self.conv2(x, edge_index)
# Wrap encoder in GAE
encoder = GCNEncoder(16, 32, 16)
model = GAE(encoder)
# Encode: produce node embeddings
z = model.encode(data.x, data.edge_index)
# z.shape: [num_nodes, 16]
# Decode: reconstruct edges
# model.decoder(z, edge_index) -> edge probabilities
# Loss: binary cross-entropy on edge reconstruction
loss = model.recon_loss(z, data.edge_index)GAE wraps any GNN encoder. The decoder (inner product) and loss (reconstruction) are built in.
Variational graph autoencoder (VGAE)
VGAE extends GAE by making the encoder probabilistic. Instead of producing a single embedding z_i, the encoder produces a mean vector mu_i and a log-variance vector, defining a Gaussian distribution. The actual embedding is sampled from this distribution during training.
from torch_geometric.nn import VGAE
class VGCNEncoder(torch.nn.Module):
def __init__(self, in_dim, hidden_dim, out_dim):
super().__init__()
self.conv1 = GCNConv(in_dim, hidden_dim)
self.conv_mu = GCNConv(hidden_dim, out_dim) # mean
self.conv_logstd = GCNConv(hidden_dim, out_dim) # log-variance
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
return self.conv_mu(x, edge_index), self.conv_logstd(x, edge_index)
model = VGAE(VGCNEncoder(16, 32, 16))
z = model.encode(data.x, data.edge_index)
# VGAE loss = reconstruction + KL divergence
loss = model.recon_loss(z, data.edge_index) + (1 / data.num_nodes) * model.kl_loss()VGAE produces distributions instead of point embeddings. KL loss regularizes the latent space.
Enterprise example: supplier relationship discovery
A manufacturing company has a known supplier graph: companies connected by existing supply relationships. But the graph is incomplete: many potential supplier relationships are unknown.
- Nodes: 50,000 companies with features (industry, size, location, capabilities)
- Known edges: 200,000 existing supply relationships
- Goal: discover missing supplier relationships for supply chain diversification
Train a VGAE on the existing supplier graph. The encoder learns company embeddings that capture both features and graph position. Decode all possible pairs: company pairs with high predicted edge probability but no existing edge are candidate new supplier relationships. Ranked by score, the top candidates are the most structurally compatible companies that do not yet have a direct relationship.
Link prediction evaluation
To evaluate a graph autoencoder for link prediction:
- Split edges into train (85%), validation (5%), and test (10%)
- Train the model on the training edges only
- Score held-out positive edges and an equal number of negative (non-existent) edges
- Compute AUC and Average Precision
On the Cora citation network, VGAE achieves ~91% AUC for link prediction. On larger enterprise graphs with richer features, performance is typically higher.