Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset7 min read

MovieLens 1M: The Classic Recommendation Benchmark as a Graph

MovieLens 1M is a dataset of 1 million movie ratings that becomes a bipartite user-movie graph in PyTorch Geometric. It bridges traditional recommendation systems and graph ML, showing how collaborative filtering becomes link prediction on a heterogeneous graph.

PyTorch Geometric

TL;DR

  • 1MovieLens 1M has 6,040 users, 3,883 movies, and 1,000,209 rating edges. It is modeled as a bipartite heterogeneous graph in PyG.
  • 2The task is link prediction: predict which user-movie pairs will have ratings. This is recommendation framed as a graph problem.
  • 3Heterogeneous graph: two node types (users, movies) require heterogeneous GNN layers (HGTConv, HeteroConv) or specialized recommendation architectures (LightGCN).
  • 4MovieLens is the bridge between collaborative filtering and graph ML. The same matrix factorization that powers traditional recommenders becomes message passing on the bipartite graph.

6,040

Users

3,883

Movies

1,000,209

Ratings (edges)

Link Prediction

Task

What MovieLens 1M contains

MovieLens 1M is a dataset of 1,000,209 ratings (1-5 stars) from 6,040 users on 3,883 movies. In PyTorch Geometric, this becomes a heterogeneous bipartite graph: user nodes and movie nodes connected by rating edges. Users have demographic features (age, gender, occupation). Movies have genre features. The edge weight is the rating value.

The bipartite structure is fundamentally different from homogeneous benchmarks like Cora or Reddit. There are two node types with different features, and edges only connect users to movies (never user-to-user or movie-to-movie directly). This heterogeneity requires specialized GNN layers.

Why MovieLens 1M matters

MovieLens is the most well-known recommendation benchmark in machine learning. Modeling it as a graph connects two research communities: the recommendation systems community (which thinks in terms of user-item matrices and collaborative filtering) and the graph ML community (which thinks in terms of message passing and link prediction). The insight: matrix factorization for recommendations is equivalent to message passing on the bipartite graph with a specific aggregation function (inner product of user and movie embeddings).

LightGCN (He et al., 2020) made this connection explicit, showing that a simplified GCN on the bipartite interaction graph outperforms traditional collaborative filtering. This result established GNNs as a first-class approach for recommendation systems.

Loading MovieLens 1M in PyG

load_movielens.py
from torch_geometric.datasets import MovieLens1M

dataset = MovieLens1M(root='/tmp/MovieLens1M')
data = dataset[0]  # HeteroData object

print(data)
# HeteroData(
#   user={ x=[...] },
#   movie={ x=[...] },
#   user__rates__movie={ edge_index=[2, 1000209] }
# )
print(f"Rating edges: {data['user', 'rates', 'movie'].edge_index.shape[1]}")

Returns HeteroData with user and movie node types. Use to_homogeneous() or heterogeneous GNN layers.

Common tasks and benchmarks

Link prediction: predict missing user-movie ratings. Evaluation uses RMSE (rating prediction) or recall@K (ranking quality). LightGCN achieves strong results with a parameter-free GCN variant on the bipartite graph. NGCF, PinSage, and heterogeneous methods (HGT) provide alternatives. On MovieLens 1M specifically, the differences between methods are small because the dataset is dense enough for simple collaborative filtering to work well.

Example: streaming service personalization

A streaming platform needs to recommend content from a catalog of 50K+ titles to millions of users. The interaction graph (user watches/rates/skips movie) provides the foundation. GNN-based recommendation propagates taste signals through the graph: if user A likes movies X and Y, and user B likes X and Z, then A might like Z (and B might like Y). This graph-based collaborative filtering powers recommendations at Netflix, Spotify, and YouTube.

Published benchmark results

Rating prediction (RMSE) and ranking (Recall@20) on MovieLens 1M. Lower RMSE is better; higher Recall@20 is better.

MethodRMSERecall@20YearPaper
MF (SVD++)~0.855--2009Koren
NeuMF--~0.0982017He et al.
NGCF--~0.1052019Wang et al.
LightGCN--~0.1142020He et al.
PinSage--~0.1082018Ying et al.

Note: RMSE and Recall@K are measured on different experimental setups. Recall@K values depend on negative sampling strategy and vary across papers.

Original Paper

GroupLens: An Open Architecture for Collaborative Filtering of Netnews

Paul Resnick, Neophytos Iacovou, Mitesh Sushak, Peter Bergstrom, John Riedl (1994). CSCW

Original data source

MovieLens 1M is maintained by the GroupLens research lab at the University of Minnesota and is available from grouplens.org/datasets/movielens/1m/.

cite_movielens.bib
@article{harper2015movielens,
  title={The MovieLens Datasets: History and Context},
  author={Harper, F Maxwell and Konstan, Joseph A},
  journal={ACM Transactions on Interactive Intelligent Systems},
  volume={5},
  number={4},
  pages={1--19},
  year={2015},
  publisher={ACM}
}

BibTeX citation for the MovieLens datasets.

Which dataset should I use?

MovieLens 1M vs MovieLens 100K/10M/25M: 100K is too small for GNN evaluation. 1M is the standard GNN benchmark size. 10M and 25M test scalability but fewer published GNN baselines exist.

MovieLens 1M vs Amazon Reviews: MovieLens has explicit ratings (1-5 stars). Amazon reviews are implicit (purchased or not) and much sparser. Use MovieLens for rating prediction; Amazon for implicit recommendation.

MovieLens 1M vs Gowalla/Yelp2018: Gowalla and Yelp2018 are larger bipartite graphs commonly used in GNN recommendation papers. Use them when you need more data or want to compare with recent papers.

From benchmark to production

Production recommendation graphs are orders of magnitude larger (100M+ users, 10M+ items) with multiple interaction types (watch, rate, add-to-list, search), temporal dynamics (recent interactions matter more), and cold-start challenges (new items with no interactions). The bipartite graph becomes a rich heterogeneous temporal graph with contextual features.

Frequently asked questions

What is the MovieLens 1M dataset?

MovieLens 1M contains 1,000,209 ratings from 6,040 users on 3,883 movies. In PyTorch Geometric, it is modeled as a bipartite graph where user nodes and movie nodes are connected by rating edges. The task is link prediction: predict which movies a user will rate (and how).

How do I load MovieLens 1M in PyTorch Geometric?

Use `from torch_geometric.datasets import MovieLens1M; dataset = MovieLens1M(root='/tmp/MovieLens1M')`. The dataset creates a heterogeneous graph with user and movie node types connected by rating edges.

Is MovieLens a heterogeneous graph?

Yes. MovieLens 1M has two node types (users and movies) and one edge type (ratings). This makes it a bipartite heterogeneous graph. Processing it with PyG requires HeteroData and heterogeneous GNN layers (HGTConv, HeteroConv) or meta-path-based approaches.

What is the standard task on MovieLens 1M?

Link prediction: given a user and a movie they have not rated, predict whether they will rate it (and optionally, the rating value 1-5). This is recommendation: predicting which movies to suggest to each user.

How do GNNs compare to matrix factorization on MovieLens?

GNNs and matrix factorization achieve similar accuracy on MovieLens 1M because the dataset is small and dense enough for collaborative filtering to work well. GNNs shine on larger, sparser recommendation datasets where content features and multi-hop patterns provide additional signal.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.