What is OGB-Products?

OGB-Products (ogbn-products) is an Amazon product co-purchasing network with 2,449,029 product nodes, 61,859,140 edges, 100-dimensional features, and 47 product categories. It is part of the Open Graph Benchmark (OGB) and is the standard large-scale GNN benchmark.

How do I load OGB-Products in PyTorch Geometric?

Use `from ogb.nodeproppred import PygNodePropPredDataset; dataset = PygNodePropPredDataset(name='ogbn-products')`. You need the `ogb` package installed alongside PyG. The download is ~1.5GB.

Can I train on OGB-Products with a single GPU?

Yes, with neighbor sampling. Full-batch training is impossible (the graph exceeds GPU memory). Use PyG's NeighborLoader with batch sizes of 1024-4096 and sample 15-25 neighbors per layer. A 16GB GPU is sufficient with proper sampling.

What is the state of the art on OGB-Products?

The OGB leaderboard tracks results. GraphSAGE achieves ~83.9% test accuracy. Top methods using external data exceed 90%. The standardized evaluation (time-based split, fixed metric) makes OGB-Products the most rigorous large-scale GNN benchmark.

How does OGB-Products compare to Amazon Computers?

OGB-Products is 178x larger than Amazon Computers (2.4M vs 13.7K nodes) with 126x more edges. It covers all Amazon product categories (47 vs 10) and uses OGB's standardized time-based split instead of random splits. It is the production-relevant version of Amazon co-purchase benchmarking.

OGB-Products Dataset: 2.4M Node Amazon Product Graph | PyG Guide

2.45M

Nodes

61.9M

Edges

100

Features

Classes

What OGB-Products contains

OGB-Products (ogbn-products) is an Amazon product co-purchasing network from the Open Graph Benchmark. Each of the 2,449,029 nodes is a product. The 61,859,140 edges connect products frequently bought together. Node features are 100-dimensional word2vec embeddings of product descriptions. The 47 classes represent Amazon product categories (Electronics, Books, Home & Kitchen, etc.).

The scale is the point. OGB-Products is 900x larger than Cora and 10x larger than Reddit by node count. At this scale, every architectural choice has measurable consequences: the wrong sampling strategy wastes GPU hours, the wrong batch size causes out-of-memory crashes, and the wrong number of GNN layers degrades convergence.

Why OGB-Products matters

Before OGB, GNN papers claimed scalability based on Cora (2.7K nodes) or Reddit (232K nodes). Neither is large enough to expose real scalability issues. OGB-Products changed this by providing a benchmark where full-batch training is physically impossible on any single GPU, forcing researchers to prove their methods actually scale.

The standardized evaluation protocol is equally important. OGB provides a fixed time-based train/validation/test split (products from earlier time periods train, later periods test), a required evaluation metric, and a public leaderboard. This eliminates the cherry-picking that plagued earlier benchmarks where each paper could choose its own favorable split.

Loading OGB-Products in PyG

load_ogb_products.py

from ogb.nodeproppred import PygNodePropPredDataset
from torch_geometric.loader import NeighborLoader

dataset = PygNodePropPredDataset(name='ogbn-products')
data = dataset[0]
split_idx = dataset.get_idx_split()

print(f"Nodes: {data.num_nodes}")   # 2449029
print(f"Edges: {data.num_edges}")   # 61859140

# Mini-batch training is required at this scale
train_loader = NeighborLoader(
    data, num_neighbors=[15, 10, 5],
    batch_size=1024, input_nodes=split_idx['train'],
)

Install ogb package: pip install ogb. The time-based split is provided via get_idx_split().

Original Paper

Open Graph Benchmark: Datasets for Machine Learning on Graphs

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec (2020). NeurIPS 2020

Read paper →

Benchmark comparison (OGB leaderboard, test accuracy)

Method	Accuracy	Year	Paper
MLP (no graph)	61.06%	2020	OGB baseline
ClusterGCN (SAGE aggr)	78.97%	2019	Chiang et al.
GCN	82.33%	2020	OGB baseline
RevGNN-112	83.07%	2021	Li et al.
GAMLP	83.54%	2022	Zhang et al.
GraphSAGE	83.89%	2020	OGB baseline

Which large-scale product dataset should I use?

Amazon Photo (7,650 nodes) and Amazon Computers (13,752 nodes) are small-scale co-purchase graphs with no standard split -- use for rapid prototyping. OGB-Products (2.4M nodes, 47 classes) has a standardized time-based split and leaderboard -- use for rigorous, reproducible benchmarking. OGB-Papers100M (111M nodes) is for testing truly distributed infrastructure. For serious research papers, OGB-Products is the minimum credible large-scale benchmark.

Common tasks and benchmarks

The task is node classification with OGB's time-based split. Products from earlier periods are training data; recent products are the test set. This inductive setup tests whether models generalize to new products. Benchmark results from the OGB leaderboard: ClusterGCN ~79.0%, GCN ~82.3%, GraphSAGE ~83.9%, GAMLP ~83.5%, RevGNN ~83.1%. The gap between ClusterGCN and top methods shows that training strategy and architecture matter at this scale.

Data source

OGB-Products is part of the Open Graph Benchmark. Download via the ogb Python package or from the OGB website. The leaderboard tracks all published results with standardized evaluation.

BibTeX citation

ogb_products.bib

@inproceedings{hu2020open,
  title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  author={Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Cite Hu et al. for all OGB datasets. Include the specific dataset name (ogbn-products) in your paper.

Example: product categorization at scale

E-commerce platforms must categorize millions of new products automatically. Each product's description provides some signal, but its co-purchase relationships provide complementary information. A phone case co-purchased with an iPhone screen protector is clearly a phone accessory, even if its description is ambiguous. GNNs on co-purchase graphs combine both signals for more accurate categorization. OGB-Products benchmarks exactly this pipeline at realistic scale.

From benchmark to production

OGB-Products is close to production scale for product categorization. The remaining gaps are: heterogeneity (production systems have users, products, orders, and reviews as different node types), temporal dynamics (seasonal trends, new product launches), and real-time serving (predictions must be generated in milliseconds for live recommendations). These requirements push beyond what standard PyG training handles.

OGB-Products: The 2.4-Million-Node Benchmark That Tests Whether Your GNN Actually Scales