6,040
Users
3,883
Movies
1,000,209
Ratings (edges)
Link Prediction
Task
What MovieLens 1M contains
MovieLens 1M is a dataset of 1,000,209 ratings (1-5 stars) from 6,040 users on 3,883 movies. In PyTorch Geometric, this becomes a heterogeneous bipartite graph: user nodes and movie nodes connected by rating edges. Users have demographic features (age, gender, occupation). Movies have genre features. The edge weight is the rating value.
The bipartite structure is fundamentally different from homogeneous benchmarks like Cora or Reddit. There are two node types with different features, and edges only connect users to movies (never user-to-user or movie-to-movie directly). This heterogeneity requires specialized GNN layers.
Why MovieLens 1M matters
MovieLens is the most well-known recommendation benchmark in machine learning. Modeling it as a graph connects two research communities: the recommendation systems community (which thinks in terms of user-item matrices and collaborative filtering) and the graph ML community (which thinks in terms of message passing and link prediction). The insight: matrix factorization for recommendations is equivalent to message passing on the bipartite graph with a specific aggregation function (inner product of user and movie embeddings).
LightGCN (He et al., 2020) made this connection explicit, showing that a simplified GCN on the bipartite interaction graph outperforms traditional collaborative filtering. This result established GNNs as a first-class approach for recommendation systems.
Loading MovieLens 1M in PyG
from torch_geometric.datasets import MovieLens1M
dataset = MovieLens1M(root='/tmp/MovieLens1M')
data = dataset[0] # HeteroData object
print(data)
# HeteroData(
# user={ x=[...] },
# movie={ x=[...] },
# user__rates__movie={ edge_index=[2, 1000209] }
# )
print(f"Rating edges: {data['user', 'rates', 'movie'].edge_index.shape[1]}")Returns HeteroData with user and movie node types. Use to_homogeneous() or heterogeneous GNN layers.
Common tasks and benchmarks
Link prediction: predict missing user-movie ratings. Evaluation uses RMSE (rating prediction) or recall@K (ranking quality). LightGCN achieves strong results with a parameter-free GCN variant on the bipartite graph. NGCF, PinSage, and heterogeneous methods (HGT) provide alternatives. On MovieLens 1M specifically, the differences between methods are small because the dataset is dense enough for simple collaborative filtering to work well.
Example: streaming service personalization
A streaming platform needs to recommend content from a catalog of 50K+ titles to millions of users. The interaction graph (user watches/rates/skips movie) provides the foundation. GNN-based recommendation propagates taste signals through the graph: if user A likes movies X and Y, and user B likes X and Z, then A might like Z (and B might like Y). This graph-based collaborative filtering powers recommendations at Netflix, Spotify, and YouTube.
Published benchmark results
Rating prediction (RMSE) and ranking (Recall@20) on MovieLens 1M. Lower RMSE is better; higher Recall@20 is better.
| Method | RMSE | Recall@20 | Year | Paper |
|---|---|---|---|---|
| MF (SVD++) | ~0.855 | -- | 2009 | Koren |
| NeuMF | -- | ~0.098 | 2017 | He et al. |
| NGCF | -- | ~0.105 | 2019 | Wang et al. |
| LightGCN | -- | ~0.114 | 2020 | He et al. |
| PinSage | -- | ~0.108 | 2018 | Ying et al. |
Note: RMSE and Recall@K are measured on different experimental setups. Recall@K values depend on negative sampling strategy and vary across papers.
Original Paper
GroupLens: An Open Architecture for Collaborative Filtering of Netnews
Paul Resnick, Neophytos Iacovou, Mitesh Sushak, Peter Bergstrom, John Riedl (1994). CSCW
Original data source
MovieLens 1M is maintained by the GroupLens research lab at the University of Minnesota and is available from grouplens.org/datasets/movielens/1m/.
@article{harper2015movielens,
title={The MovieLens Datasets: History and Context},
author={Harper, F Maxwell and Konstan, Joseph A},
journal={ACM Transactions on Interactive Intelligent Systems},
volume={5},
number={4},
pages={1--19},
year={2015},
publisher={ACM}
}BibTeX citation for the MovieLens datasets.
Which dataset should I use?
MovieLens 1M vs MovieLens 100K/10M/25M: 100K is too small for GNN evaluation. 1M is the standard GNN benchmark size. 10M and 25M test scalability but fewer published GNN baselines exist.
MovieLens 1M vs Amazon Reviews: MovieLens has explicit ratings (1-5 stars). Amazon reviews are implicit (purchased or not) and much sparser. Use MovieLens for rating prediction; Amazon for implicit recommendation.
MovieLens 1M vs Gowalla/Yelp2018: Gowalla and Yelp2018 are larger bipartite graphs commonly used in GNN recommendation papers. Use them when you need more data or want to compare with recent papers.
From benchmark to production
Production recommendation graphs are orders of magnitude larger (100M+ users, 10M+ items) with multiple interaction types (watch, rate, add-to-list, search), temporal dynamics (recent interactions matter more), and cold-start challenges (new items with no interactions). The bipartite graph becomes a rich heterogeneous temporal graph with contextual features.