Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset7 min read

Amazon Computers: Where GNNs Meet Product Recommendations

Amazon Computers is a co-purchase network of 13,752 products connected by 'frequently bought together' relationships. It is the closest standard benchmark to real-world recommendation systems, and the most commercially relevant graph in the PyG ecosystem.

PyTorch Geometric

TL;DR

  • 1Amazon Computers has 13,752 product nodes connected by 491,722 co-purchase edges. Features are 767-dimensional bag-of-words vectors from product reviews. Products belong to 10 categories.
  • 2The graph is dense (avg degree ~35), making it ideal for testing GNN performance when neighborhood information is plentiful -- the opposite challenge from sparse citation networks.
  • 3Co-purchase graphs directly model the 'bought together' signal that drives Amazon-style recommendations. GNNs on this data learn product similarity from purchasing patterns.
  • 4This is a stepping stone to production recommendation systems where graphs have millions of users, products, and interactions across multiple relationship types.

13,752

Nodes

491,722

Edges

767

Features

10

Classes

What Amazon Computers contains

Amazon Computers is a segment of the Amazon product co-purchase graph. Nodes represent products in the Computers category. An edge between two products means they are frequently bought together. Node features are 767-dimensional bag-of-words vectors extracted from product reviews. The 10 classes correspond to product subcategories (laptops, desktops, peripherals, etc.).

With an average degree of ~35, this graph is dramatically denser than Cora (~4). Each product has many co-purchase connections, providing rich neighborhood context for GNN aggregation. This density reflects the nature of e-commerce: customers buy multiple products, creating a dense web of co-purchase relationships.

Why Amazon Computers matters

This dataset represents the first commercially meaningful graph in many practitioners' learning journey. Citation networks are academic curiosities. Co-purchase graphs directly power a multi-billion dollar industry: product recommendation. Learning to classify products using their purchase relationships is one step away from learning to recommend products to users.

The dataset also tests GNNs in a different regime. On sparse citation networks, the challenge is extracting signal from limited connections. On dense co-purchase networks, the challenge is handling high-degree nodes efficiently and learning which co-purchase connections are most informative (a laptop's co-purchase with a charger is less informative than its co-purchase with a specific software package).

Loading Amazon Computers in PyG

load_amazon_computers.py
from torch_geometric.datasets import Amazon

dataset = Amazon(root='/tmp/Amazon', name='Computers')
data = dataset[0]

print(f"Nodes: {data.num_nodes}")        # 13752
print(f"Edges: {data.num_edges}")        # 491722
print(f"Features: {data.num_features}")  # 767
print(f"Classes: {dataset.num_classes}") # 10
print(f"Avg degree: {data.num_edges / data.num_nodes:.1f}")  # ~35.8

No standard train/test split -- use random splits or follow OGB-style splitting for reproducibility.

Original Paper

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, Stephan Gunnemann (2018). NeurIPS 2018 Workshop on Relational Representation Learning

Read paper →

Benchmark comparison (random 60/20/20 splits)

MethodAccuracyYearPaper
MLP (no graph)~73.8%--Baseline
GCN~86.5%2017Kipf & Welling
GAT~86.9%2018Velickovic et al.
GraphSAGE~86.2%2017Hamilton et al.
GCNII~87.4%2020Chen et al.

Which Amazon co-purchase dataset should I use?

Amazon Photo (7,650 nodes, 8 classes) is smaller and trains faster -- use it for rapid prototyping. Amazon Computers (13,752 nodes, 10 classes) is the medium-scale benchmark for co-purchase experiments. OGB-Products (2.4M nodes, 47 classes) is the full-scale version with a standardized time-based split. Use Photo for development, Computers for validation, and OGB-Products for production-scale testing.

Common tasks and benchmarks

The primary task is node classification: predict the product category from co-purchase relationships and review features. Unlike Planetoid datasets, Amazon Computers has no canonical train/test split. Most papers use random 60/20/20 splits or follow a fixed seed protocol. GCN achieves ~82-86%, GAT ~83-87%, and GraphSAGE performs comparably. The dense graph provides enough signal that most GNN architectures perform well.

Data source

The Amazon co-purchase datasets were introduced by Shchur et al. (2018) and are derived from the Amazon product co-purchase metadata from SNAP. PyG downloads the processed version automatically.

BibTeX citation

amazon_computers.bib
@article{shchur2018pitfalls,
  title={Pitfalls of Graph Neural Network Evaluation},
  author={Shchur, Oleksandr and Mumme, Maximilian and Bojchevski, Aleksandar and G{\"u}nnemann, Stephan},
  journal={arXiv preprint arXiv:1811.05868},
  year={2018}
}

@inproceedings{mcauley2015image,
  title={Image-Based Recommendations on Styles and Substitutes},
  author={McAuley, Julian and Targett, Christopher and Shi, Qinfeng and van den Hengel, Anton},
  booktitle={SIGIR},
  year={2015}
}

Cite Shchur et al. for the benchmark, McAuley et al. for the original Amazon data.

Example: product recommendation pipeline

The leap from classifying products to recommending them is small. Instead of predicting a product's category, predict which products a user is likely to buy next. The co-purchase graph becomes a bipartite user-product interaction graph. A user's recent purchases define their neighborhood, and GNN aggregation identifies products that similar purchasers also bought. This is exactly how modern GNN-based recommendation systems work at companies like Pinterest and Uber.

From benchmark to production

Amazon Computers has 13K products. Amazon's actual catalog has hundreds of millions. Production recommendation graphs add users (billions of interactions), temporal dynamics (seasonal trends, recent purchases weigh more), and multiple interaction types (view, click, add-to-cart, purchase, review, return). The homogeneous co-purchase graph becomes a heterogeneous temporal interaction graph.

Frequently asked questions

What is the Amazon Computers dataset?

Amazon Computers is a co-purchase network where nodes represent computer products from Amazon and edges connect products frequently bought together. It has 13,752 nodes, 491,722 edges, 767-dimensional bag-of-words features from product reviews, and 10 product categories.

How do I load Amazon Computers in PyTorch Geometric?

Use `from torch_geometric.datasets import Amazon; dataset = Amazon(root='/tmp/Amazon', name='Computers')`. The dataset contains a single graph with node features, edges, and labels.

How does Amazon Computers differ from citation networks?

Amazon Computers is much denser (avg degree ~35 vs ~4 for Cora), has commercially relevant features (product review text), and tests whether GNNs can learn purchasing patterns rather than citation patterns. The higher density means more neighborhood information per node.

What are good benchmark results on Amazon Computers?

GCN achieves ~82-86% accuracy depending on the train/test split (there is no standard fixed split). GAT and GraphSAGE typically perform comparably. The dense co-purchase structure provides strong signal for most GNN architectures.

Why is Amazon Computers important for recommendation systems?

It demonstrates the co-purchase graph approach to recommendations: products connected by co-purchase edges share customer intent. This is the foundation of 'customers who bought X also bought Y' systems, and GNNs can learn these patterns to recommend products to new customers.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.