14,541
Entities
310,116
Edges
237
Relations
Link Prediction
Task
What FB15k-237 contains
FB15k-237 is a subset of Freebase, a large collaborative knowledge graph that Google acquired in 2010. The dataset contains 14,541 entities (people, places, organizations, concepts) connected by 310,116 triples of the form (head, relation, tail). The 237 relation types include “born_in,” “directed_by,” “nationality,” “genre,” and hundreds more.
The “237” in the name refers to the number of relation types after removing inverse relations from the original FB15k. This removal was crucial: in FB15k, models could achieve high accuracy by simply memorizing that (A, born_in, B) implies (B, birthplace_of, A). FB15k-237 eliminates this shortcut, requiring models to learn genuine relational patterns.
Why FB15k-237 matters
Link prediction on knowledge graphs is one of the most commercially important tasks in graph ML. Google uses it to answer questions (“Who directed Inception?” requires predicting (Inception, directed_by, ?)). Amazon uses it to infer product attributes. Enterprise knowledge graphs use it to complete missing data (if a company has a CEO and headquarters, predict its industry).
FB15k-237 provides a controlled benchmark for this task. The 237 relation types create a rich multi-relational graph where the model must learn type-specific patterns: the relation “born_in” connects people to locations, “directed_by” connects films to people. Models that capture these type constraints perform best.
Loading FB15k-237 in PyG
from torch_geometric.datasets import FB15k_237
dataset = FB15k_237(root='/tmp/FB15k237')
data = dataset[0]
print(f"Entities: {data.num_nodes}") # 14541
print(f"Triples: {data.num_edges}") # 310116
print(f"Relations: {data.edge_type.max() + 1}") # 237
# Access train/val/test edge splits
print(f"Train edges: {data.train_mask.sum()}")
# Evaluation: rank correct tail among all entitiesEach triple has head, relation, and tail indices. Use knowledge graph embedding frameworks for training.
Common tasks and benchmarks
Link prediction with filtered MRR and Hits@K. For each test triple (h, r, ?), score all possible tails and report the rank of the correct one. TransE: ~0.294 MRR. ComplEx: ~0.247. RotatE: ~0.338. R-GCN: ~0.249. CompGCN: ~0.355. The multi-relational structure benefits methods (RotatE, CompGCN) that model relation-specific transformations in embedding space.
Example: enterprise data completion
A company's CRM contains partial data: some contacts have company names but no industry, some have titles but no department. The CRM forms a knowledge graph. Link prediction fills gaps: given (Contact, works_at, Acme Corp) and (Acme Corp, industry, ?), predict the industry. This automated data completion improves lead scoring, segmentation, and reporting quality without manual data entry.
Published benchmark results
Link prediction on FB15k-237. Filtered MRR and Hits@10. Higher is better for both metrics.
| Method | MRR | Hits@10 | Year | Paper |
|---|---|---|---|---|
| TransE | 0.294 | 0.465 | 2013 | Bordes et al. |
| DistMult | 0.241 | 0.419 | 2015 | Yang et al. |
| ComplEx | 0.247 | 0.428 | 2016 | Trouillon et al. |
| RotatE | 0.338 | 0.533 | 2019 | Sun et al. |
| R-GCN | 0.249 | 0.417 | 2018 | Schlichtkrull et al. |
| CompGCN | 0.355 | 0.535 | 2020 | Vashishth et al. |
Original Paper
Observed versus Latent Features for Knowledge Base and Text Inference
Kristina Toutanova, Danqi Chen (2015). 3rd Workshop on Continuous Vector Space Models and their Compositionality
Read paper →Original data source
FB15k-237 was created by Toutanova and Chen (2015) by removing inverse relations from FB15k. The dataset is available from Microsoft Research. The original Freebase data is from Google's Freebase project.
@inproceedings{toutanova2015observed,
title={Observed versus Latent Features for Knowledge Base and Text Inference},
author={Toutanova, Kristina and Chen, Danqi},
booktitle={3rd Workshop on Continuous Vector Space Models and their Compositionality},
pages={57--66},
year={2015}
}BibTeX citation for the FB15k-237 dataset.
Which dataset should I use?
FB15k-237 vs FB15k: Always use FB15k-237. FB15k has inverse relation leakage that inflates scores artificially. FB15k-237 fixes this and is the accepted standard.
FB15k-237 vs WN18RR: Both are standard KG benchmarks. FB15k-237 (Freebase) has more relation types (237 vs 11) and is more diverse. WN18RR (WordNet) tests hierarchical reasoning. Most papers report on both.
FB15k-237 vs NELL: FB15k-237 is for link prediction. NELL is for entity typing (node classification). Different tasks on knowledge graphs.
From benchmark to production
Production knowledge graphs have millions of entities, thousands of relation types, and temporal dynamics (facts change: CEOs leave, companies merge). They also require reasoning chains: combining multiple triples to infer new facts. FB15k-237 tests single-hop prediction; production systems need multi-hop reasoning.