Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset8 min read

OGB-Papers100M: 111 Million Nodes. The Ultimate GNN Scale Test.

OGB-Papers100M is a citation graph with 111 million papers and 1.6 billion edges. It is the largest standard GNN benchmark by a wide margin -- the dataset that separates methods that claim to scale from methods that actually do.

PyTorch Geometric

TL;DR

  • 1OGB-Papers100M has 111,059,956 nodes, 1,615,685,872 edges, 128 features, and 172 subject area classes. It is the largest standard GNN benchmark.
  • 2The graph requires ~400GB RAM to store. Training demands distributed multi-GPU infrastructure with graph partitioning. Single-GPU training is impractical.
  • 3Engineering dominates at this scale: graph partitioning, distributed training, memory-efficient feature storage, and fault-tolerant pipelines matter more than architecture.
  • 4Few methods have full results. SIGN achieves ~65.7%, GraphSAGE variants ~67%, and top methods with GIANT embeddings ~69.7%. The benchmark tests infrastructure as much as algorithms.
  • 5KumoRFM operates at Papers100M scale as standard. Its distributed graph transformer was designed for billion-node enterprise graphs from the start.

111M

Nodes

1.6B

Edges

128

Features

172

Classes

What OGB-Papers100M contains

OGB-Papers100M (ogbn-papers100M) is a citation graph from the Microsoft Academic Graph. The 111,059,956 nodes represent academic papers spanning all scientific disciplines. The 1,615,685,872 edges represent citation links. Each paper has a 128-dimensional word2vec feature vector from its title and abstract. The 172 classes are academic subject areas (spanning CS, physics, biology, medicine, and more).

The numbers are staggering. The graph is 41,000x larger than Cora, 477x larger than Reddit, and 45x larger than OGB-Products. Just storing the edge list requires ~12GB. Storing node features adds ~54GB. The full working memory during training exceeds 400GB. This is not a dataset for laptops.

Why OGB-Papers100M matters

Most GNN papers claim scalability but test on datasets that fit on a single GPU. OGB-Products (2.4M nodes) is “large” by benchmark standards but tiny by production standards. Papers100M closes this gap. At 111M nodes and 1.6B edges, it approaches the scale of real production graphs at companies like Google, Meta, and LinkedIn.

The engineering challenges are real: How do you partition a billion- edge graph across GPUs? How do you sample neighbors without cross-machine communication bottlenecks? How do you handle checkpointing and fault recovery for multi-day training runs? These questions are invisible on smaller benchmarks but dominate at Papers100M scale.

Loading OGB-Papers100M

load_ogb_papers100m.py
from ogb.nodeproppred import PygNodePropPredDataset

# WARNING: ~57GB download, ~400GB RAM to load
dataset = PygNodePropPredDataset(name='ogbn-papers100M',
                                  root='/data/OGB')
data = dataset[0]
split_idx = dataset.get_idx_split()

print(f"Nodes: {data.num_nodes}")   # 111059956
print(f"Edges: {data.num_edges}")   # 1615685872

# Distributed training with graph partitioning required
# See PyG Distributed or DistDGL documentation

Requires significant disk space and RAM. Use memory-mapped files or distributed loading.

Original Paper

Open Graph Benchmark: Datasets for Machine Learning on Graphs

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec (2020). NeurIPS 2020

Read paper →

Benchmark comparison (OGB leaderboard, test accuracy)

MethodAccuracyYearPaper
MLP (no graph)47.24%2020OGB baseline
SIGN65.68%2020Frasca et al.
GraphSAGE (res_incep)67.06%2020OGB baseline
GAMLP67.71%2022Zhang et al.
GAMLP+RLU68.25%2022Zhang et al.
GIANT-XRT + GAMLP+RLU69.67%2022Chien et al.

Which OGB dataset should I use?

OGB-Products (2.4M nodes, 61.9M edges) is manageable on a single GPU with sampling -- use it to prove your method scales beyond Reddit. OGB-Papers100M (111M nodes, 1.6B edges) requires distributed multi-GPU infrastructure -- use it only if you are building or validating truly distributed GNN systems. Most researchers should benchmark on OGB-Products. OGB-Papers100M is for infrastructure teams and systems researchers.

Common tasks and benchmarks

Node classification with OGB's time-based split. Due to the extreme scale, few methods have complete results. SIGN (Scalable Inception Graph Networks): ~65.7%. GraphSAGE variants: ~67.1%. Simple MLP (no graph): ~47.2%. The 20-point gap between MLP and graph methods confirms that citation structure is highly informative even at this scale. But the engineering cost of leveraging that structure is substantial.

Data source

OGB-Papers100M is part of the Open Graph Benchmark. Download via the ogb Python package or from the OGB website. Warning: the download is ~57GB and loading requires ~400GB RAM.

BibTeX citation

ogb_papers100m.bib
@inproceedings{hu2020open,
  title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  author={Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Cite Hu et al. for all OGB datasets. Include the specific dataset name (ogbn-papers100M) in your paper.

Example: scientific literature search

Semantic Scholar indexes 200M+ academic papers. Classifying papers by subject area, identifying citation trends, and recommending relevant papers all operate at Papers100M scale. A GNN that can process the full citation graph provides richer paper representations than text-only models, capturing how a paper fits into the broader scientific landscape through its citation relationships.

From benchmark to production

Papers100M is close to production scale for some applications but still simpler than the most demanding real-world graphs. Production social networks have billions of nodes with dynamic edges (new connections every second). Enterprise knowledge graphs combine hundreds of entity types and relationship types. And real-time serving adds latency constraints that batch training does not face.

Frequently asked questions

What is OGB-Papers100M?

OGB-Papers100M (ogbn-papers100M) is a citation graph of 111,059,956 papers with 1,615,685,872 edges. Each paper has a 128-dimensional word2vec feature vector and one of 172 subject area labels. It is the largest standard GNN benchmark and tests true production-scale graph processing.

Can I train on OGB-Papers100M with a single GPU?

Not practically. The graph requires ~400GB of memory to store. Training requires distributed multi-GPU setups with graph partitioning (DistDGL, PyG's distributed module), or aggressive subsampling with NeighborLoader at very small batch sizes. Most research papers use 4-8 GPUs minimum.

How do I load OGB-Papers100M?

Use `from ogb.nodeproppred import PygNodePropPredDataset; dataset = PygNodePropPredDataset(name='ogbn-papers100M')`. The download is ~57GB. Loading requires significant RAM (400GB+ recommended). Use memory-mapped files or distributed loading for constrained environments.

What results are expected on OGB-Papers100M?

Due to its extreme size, few methods have been fully evaluated. SIGN achieves ~65.7% accuracy. GraphSAGE variants achieve ~67%. Top methods with GIANT embeddings reach ~69.7%. The challenge is primarily engineering (fitting the graph in memory, efficient distributed training) rather than architecture.

Why does OGB-Papers100M exist?

Most GNN benchmarks are too small to expose real scalability challenges. Even OGB-Products (2.4M nodes) fits on a single GPU with sampling. Papers100M requires truly distributed infrastructure, closing the gap between benchmark scale and production graph sizes at companies like Google and Meta.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.