Knowledge graph completion predicts missing facts in knowledge bases by learning patterns from existing entity-relation-entity triples. A knowledge graph stores structured facts as triples: (Albert Einstein, bornIn, Ulm), (Ulm, locatedIn, Germany), (Einstein, field, Physics). No knowledge graph is complete. Freebase has millions of entities but most have only a few facts. Enterprise product catalogs have thousands of missing attributes. Completion models predict these gaps.
The link prediction formulation
Given a knowledge graph with entities E and relations R, knowledge graph completion is a link prediction task: given (head, relation, ?), predict the missing tail entity. Or given (?, relation, tail), predict the missing head.
The model learns to score every possible triple. For the query (Einstein, nationality, ?), it should rank “German” higher than “French” or “Brazilian”. Training uses existing triples as positive examples and corrupted triples (replacing head or tail with random entities) as negatives.
Embedding-based methods
The first generation of knowledge graph completion methods learn static embedding vectors for every entity and relation:
TransE: relations as translations
TransE models each relation as a translation vector. For a valid triple (h, r, t), the model enforces: h + r is approximately equal to t in embedding space. If Einstein's embedding plus the “bornIn” vector lands near Ulm's embedding, the triple is scored highly.
RotatE: relations as rotations
RotatE uses complex-valued embeddings and models relations as rotations. This handles patterns that TransE cannot: symmetric relations (married-to), inverse relations (bornIn/birthplaceOf), and composed relations (bornIn + locatedIn = nationality).
import torch
from torch_geometric.nn import TransE, RotatE
# TransE: h + r ≈ t
model = TransE(
num_nodes=14541, # entities in FB15k-237
num_relations=237, # relation types
hidden_channels=256,
)
# Score a batch of triples
head_index = torch.tensor([0, 1, 2])
rel_type = torch.tensor([5, 12, 5])
tail_index = torch.tensor([100, 200, 300])
score = model(head_index, rel_type, tail_index)
# Lower score = more plausible triplePyG provides implementations of TransE, RotatE, and other knowledge graph embedding methods with training utilities.
GNN-based methods
Embedding methods assign a fixed vector to each entity regardless of context. GNN-based methods improve on this by computing entity embeddings through message passing over the knowledge graph structure:
- R-GCN (Relational Graph Convolutional Network): uses relation-specific weight matrices in message passing. Each relation type has its own transformation, allowing the model to learn different propagation patterns for different relation types.
- CompGCN: jointly embeds entities and relations during message passing. The relation embedding modifies how messages are computed, enabling composition of relations across multiple hops.
Relational patterns and scoring functions
Different scoring functions capture different relational patterns:
- Symmetry (married-to): if (A, r, B) then (B, r, A). RotatE handles this with 180-degree rotation; TransE cannot.
- Antisymmetry (parent-of): if (A, r, B) then NOT (B, r, A). TransE handles this naturally.
- Inversion (bornIn/birthplaceOf): r1 is the inverse of r2. RotatE models this as opposite rotations.
- Composition (bornIn + locatedIn = nationality): RotatE composes rotations; ComplEx handles this through bilinear scoring.
Enterprise applications
Knowledge graph completion has immediate business value:
- Product knowledge graphs: an e-commerce catalog with 10 million products where 60% have missing attributes. Completion predicts (ProductX, hasBrand, ?), (ProductX, compatibleWith, ?), and (ProductX, inCategory, ?).
- Customer knowledge graphs: inferring customer preferences from partial interaction data. If a customer bought running shoes and a fitness tracker, predict (Customer, interestedIn, Marathon Training).
- Drug discovery: predicting drug-gene-disease interactions. If DrugA targets GeneB and GeneB is implicated in DiseaseC, predict (DrugA, treats, DiseaseC).
- Internal knowledge management: connecting employees to skills, projects, and documents. Predict (Employee, expertIn, ?) to route questions to the right expert.