7,650
Nodes
238,162
Edges
745
Features
8
Classes
What Amazon Photo contains
Amazon Photo is a co-purchase network from Amazon's Photography product category. Each of the 7,650 nodes is a product (cameras, lenses, tripods, accessories). The 238,162 edges connect products frequently purchased together. Node features are 745-dimensional bag-of-words vectors derived from product reviews. The 8 classes represent product subcategories.
Like Amazon Computers, this graph is dense. The average degree is ~31, meaning each product has co-purchase connections to roughly 30 other products. Photography accessories cluster tightly: a camera body connects to lenses, memory cards, bags, and tripods, creating a rich local neighborhood for GNN aggregation.
Why Amazon Photo matters
Amazon Photo serves as the quick-iteration counterpart to Amazon Computers. When developing a GNN architecture for product recommendation, you want a fast development cycle. Photo trains in seconds and captures the same co-purchase patterns as larger product graphs. Once your model works on Photo, scale to Computers (13K nodes) and then OGB-Products (2.4M nodes) for production validation.
The photography domain also illustrates a specific recommendation challenge: complementary products. Cameras and lenses are complements, not substitutes. A good recommendation model must learn that a user who bought a Canon camera should be recommended Canon-compatible lenses, not a competing Nikon camera. Co-purchase edges encode this complementarity directly.
Loading Amazon Photo in PyG
from torch_geometric.datasets import Amazon
dataset = Amazon(root='/tmp/Amazon', name='Photo')
data = dataset[0]
print(f"Nodes: {data.num_nodes}") # 7650
print(f"Edges: {data.num_edges}") # 238162
print(f"Features: {data.num_features}") # 745
print(f"Classes: {dataset.num_classes}") # 8Same Amazon API as Computers. No standard split -- use random train/val/test partitioning.
Original Paper
Pitfalls of Graph Neural Network Evaluation
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, Stephan Gunnemann (2018). NeurIPS 2018 Workshop on Relational Representation Learning
Read paper →Benchmark comparison (random 60/20/20 splits)
| Method | Accuracy | Year | Paper |
|---|---|---|---|
| MLP (no graph) | ~78.5% | -- | Baseline |
| GCN | ~91.2% | 2017 | Kipf & Welling |
| GAT | ~91.7% | 2018 | Velickovic et al. |
| GraphSAGE | ~91.0% | 2017 | Hamilton et al. |
| GCNII | ~92.4% | 2020 | Chen et al. |
Which Amazon co-purchase dataset should I use?
Amazon Photo (7,650 nodes, 8 classes) is the smallest and fastest -- ideal for prototyping and quick iteration. Amazon Computers (13,752 nodes, 10 classes) has nearly double the nodes and two more classes. OGB-Products (2.4M nodes, 47 classes) is the production-scale option with a standardized split. Start with Photo, graduate to Computers, and validate at scale on OGB-Products.
Common tasks and benchmarks
Node classification is the standard task: predict each product's subcategory from co-purchase structure and review features. With no canonical split, researchers typically use random 60/20/20 or 10/10/80 partitions. Most GNN architectures achieve 90%+ accuracy on favorable splits, thanks to the dense graph structure and relatively few categories.
Link prediction is a natural alternative task: given a partial co-purchase graph, predict which product pairs will be bought together. This directly mirrors the recommendation use case and is often more interesting than node classification for practitioners building recommendation systems.
Data source
The Amazon co-purchase datasets were introduced by Shchur et al. (2018) and are derived from the Amazon product metadata from SNAP. PyG downloads the processed version automatically.
BibTeX citation
@article{shchur2018pitfalls,
title={Pitfalls of Graph Neural Network Evaluation},
author={Shchur, Oleksandr and Mumme, Maximilian and Bojchevski, Aleksandar and G{\"u}nnemann, Stephan},
journal={arXiv preprint arXiv:1811.05868},
year={2018}
}
@inproceedings{mcauley2015image,
title={Image-Based Recommendations on Styles and Substitutes},
author={McAuley, Julian and Targett, Christopher and Shi, Qinfeng and van den Hengel, Anton},
booktitle={SIGIR},
year={2015}
}Cite Shchur et al. for the benchmark, McAuley et al. for the original Amazon data.
Example: photography accessory bundles
An e-commerce platform wants to suggest product bundles. The co-purchase graph reveals which products naturally go together: a camera body, a 50mm lens, a memory card, and a carrying case form a natural bundle. GNN aggregation discovers these clusters by learning that tightly connected product neighborhoods represent complementary sets. Amazon Photo provides the training data for this exact task.
From benchmark to production
Production product graphs include user nodes (who bought what), temporal ordering (recent purchases matter more), and multiple interaction types (view, click, purchase, return). Amazon Photo captures only the product-to-product co-purchase layer. A production system must integrate all these signals into a unified heterogeneous graph.