12,000
Graphs
~117
Avg Nodes
7
Node Features
6
Classes
What CLUSTER contains
CLUSTER is a synthetic benchmark from the “Benchmarking Graph Neural Networks” paper. Each of the 12,000 graphs is generated from a stochastic block model with 6 communities. Nodes within the same community are densely connected; connections between communities are sparser. Each node has 7-dimensional features and a community label (0-5). The task is to correctly assign each node to its community.
Graphs average 117 nodes and 4,304 edges. The community structure varies in difficulty across graphs: some have clearly separated clusters while others have more overlap, testing model robustness across varying difficulty levels.
Why CLUSTER matters
Community detection is one of the oldest and most important graph analysis tasks. It appears throughout industry: customer segmentation (find groups of similar customers), social network analysis (identify communities of interest), biological network analysis (find functional modules in PPI networks), and organizational analytics (detect team structures from communication patterns).
CLUSTER provides a controlled benchmark for this task. The key finding: attention mechanisms (GAT, GPS) significantly outperform fixed aggregation (GIN, GCN) on community detection. This makes intuitive sense -- identifying community boundaries requires weighting intra-community edges more heavily than inter-community ones, which is exactly what attention learns to do.
Loading CLUSTER in PyG
from torch_geometric.datasets import GNNBenchmarkDataset
train_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='test')
print(f"Train: {len(train_dataset)}") # 10000
print(f"Val: {len(val_dataset)}") # 1000
print(f"Test: {len(test_dataset)}") # 1000Same GNNBenchmarkDataset API as PATTERN. Per-node 6-class classification.
Common tasks and benchmarks
Per-node 6-class classification (community assignment) with weighted accuracy. GCN: 68.5%, GIN: 64.7%, GAT: 70.6%, GraphSage: 63.8%, PNA: 76.0%, GPS: 76.1%. The standout result: GIN underperforms GCN, the opposite of PATTERN. This shows that no single aggregation strategy dominates all structural tasks. GPS performs best because it combines local message passing with global attention.
Example: customer segmentation
An e-commerce company wants to segment customers into groups for targeted marketing. The customer interaction graph (who buys similar products, who reviews the same items) naturally forms communities. High-value customers cluster with other high-value customers. Bargain hunters form their own group. GNN-based community detection identifies these segments from the interaction structure, enabling personalized marketing without manual segment definition.
Published benchmark results
Per-node 6-class classification with weighted accuracy on CLUSTER. ~500K parameter budget. Higher is better.
| Method | Weighted Acc (%) | Year | Paper |
|---|---|---|---|
| GCN | 68.5 | 2020 | Dwivedi et al. |
| GAT | 70.6 | 2020 | Dwivedi et al. |
| GraphSage | 63.8 | 2020 | Dwivedi et al. |
| GIN | 64.7 | 2020 | Dwivedi et al. |
| GatedGCN | 73.8 | 2020 | Dwivedi et al. |
| PNA | 76.0 | 2020 | Corso et al. |
| SAN | 76.7 | 2022 | Kreuzer et al. |
| GPS | 76.1 | 2022 | Rampasek et al. |
Original Paper
Benchmarking Graph Neural Networks
V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research
Read paper →Original data source
CLUSTER is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.
@article{dwivedi2023benchmarking,
title={Benchmarking Graph Neural Networks},
author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
journal={Journal of Machine Learning Research},
volume={24},
number={43},
pages={1--48},
year={2023}
}BibTeX citation for the Benchmarking GNNs paper (CLUSTER dataset).
Which dataset should I use?
CLUSTER vs PATTERN: Both are from Benchmarking GNNs but test different skills. CLUSTER tests community detection (where attention helps). PATTERN tests sub-pattern detection (where sum aggregation helps). Run both to understand your model.
CLUSTER vs Karate Club: Karate Club has 34 nodes and is for tutorials only. CLUSTER has 12,000 graphs and is for rigorous benchmarking of community detection capabilities.
CLUSTER vs Cora: CLUSTER is synthetic with controlled community structure. Cora is real-world with organic community patterns. CLUSTER isolates community detection ability; Cora conflates multiple learning signals.
From benchmark to production
Production community detection handles overlapping communities (a customer can belong to multiple segments), hierarchical structure (segments contain sub-segments), and dynamic evolution (communities form and dissolve over time). CLUSTER's non-overlapping, static communities are a simplified starting point.