89,250
Nodes
899,756
Edges
500
Features
7
Classes
What Flickr contains
Flickr is a social network built from the Flickr image-sharing platform. Each of the 89,250 nodes represents an image. Edges connect images that share common metadata: the same geographic location, the same gallery, or comments from the same user. Node features are 500-dimensional vectors from image descriptions. The task is to classify images into 7 categories based on their content type.
The key characteristic of Flickr is edge noise. Two images taken at the same tourist landmark may depict completely different subjects (a landscape vs. a portrait). Unlike citations (which reflect topical similarity) or co-purchases (which reflect intent similarity), Flickr's edges carry weaker semantic signal.
Why Flickr matters
Most GNN benchmarks have clean graph structure: citations connect related papers, co-purchases connect related products. Real-world graphs are messier. Customer transaction graphs include routine purchases alongside meaningful ones. Social graphs include bot connections alongside genuine relationships. Flickr is one of the few benchmarks that captures this noise.
The practical lesson: accuracy on Flickr predicts production robustness better than accuracy on Cora. A model that handles Flickr's noisy edges well will likely handle the noise in enterprise graphs. Attention-based methods (GAT, TransformerConv) have an advantage here because they can learn to down-weight noisy edges.
Loading Flickr in PyG
from torch_geometric.datasets import Flickr
dataset = Flickr(root='/tmp/Flickr')
data = dataset[0]
print(f"Nodes: {data.num_nodes}") # 89250
print(f"Edges: {data.num_edges}") # 899756
print(f"Features: {data.num_features}") # 500
print(f"Classes: {dataset.num_classes}") # 7Flickr provides standard train/val/test masks. The dataset downloads are moderate (~200MB).
Original Paper
GraphSAINT: Graph Sampling Based Inductive Learning Method
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna (2020). ICLR 2020
Read paper →Benchmark comparison (standard split)
| Method | Accuracy | Year | Paper |
|---|---|---|---|
| GraphSAGE | ~50.1% | 2017 | Hamilton et al. |
| GCN | ~53.4% | 2017 | Kipf & Welling |
| GAT | ~54.2% | 2018 | Velickovic et al. |
| GraphSAINT | ~51.1% | 2020 | Zeng et al. |
| ClusterGCN | ~48.1% | 2019 | Chiang et al. |
Which medium-scale social dataset should I use?
Flickr (89K nodes, 7 classes) has noisy metadata-based edges -- use it to test robustness when graph structure is unreliable. Reddit (232K nodes, 41 classes) has clean co-comment edges and is the standard scalability benchmark. Yelp (716K nodes, 100 multi-labels) tests multi-label classification at scale. If you need to evaluate noise robustness, Flickr is the right choice. If you need clean scalability testing, pick Reddit.
Common tasks and benchmarks
Node classification with the standard split. GraphSAGE achieves ~50.1%, GraphSAINT ~51.1%, GCN ~53.4%, GAT ~54.2%. These numbers are notably lower than Reddit (95%+) despite having fewer classes (7 vs 41), confirming that graph noise is the primary difficulty. Models with attention mechanisms consistently outperform fixed-aggregation models here.
Data source
The Flickr dataset was introduced in the GraphSAINT paper and is available from the GraphSAINT GitHub repository. PyG downloads the processed version automatically.
BibTeX citation
@inproceedings{zeng2020graphsaint,
title={GraphSAINT: Graph Sampling Based Inductive Learning Method},
author={Zeng, Hanqing and Zhou, Hongkuan and Srivastava, Ajitesh and Kannan, Rajgopal and Prasanna, Viktor},
booktitle={International Conference on Learning Representations (ICLR)},
year={2020}
}Cite Zeng et al. for the Flickr dataset and the GraphSAINT sampling method.
Example: content moderation on social platforms
Social platforms must classify user-generated content at scale. Images, posts, and comments need categorization for feed ranking, ad placement, and content moderation. The graph structure (who interacts with what) provides context that individual content features miss. A borderline image posted in a photography community is different from the same image shared in a meme group. Flickr's noisy graph structure mirrors this challenge.
From benchmark to production
Production social graphs have billions of nodes with highly noisy edges. Users follow accounts they never engage with. Bot networks create artificial connections. Content goes viral across unrelated communities. Handling this noise at scale requires models that selectively attend to informative connections while ignoring noise.