What is the CLUSTER dataset?

CLUSTER is a synthetic benchmark from the 'Benchmarking GNNs' paper. It contains 12,000 graphs averaging 117 nodes, 4,304 edges, 7 features, and 6 cluster labels. The task is to assign each node to its correct community in a stochastic block model graph.

How does CLUSTER differ from PATTERN?

PATTERN is binary (pattern vs background) while CLUSTER is 6-class (which community). CLUSTER tests community detection ability -- identifying groups of densely connected nodes. PATTERN tests sub-pattern detection. Both are from the same Benchmarking GNNs suite.

How do I load CLUSTER in PyTorch Geometric?

Use `from torch_geometric.datasets import GNNBenchmarkDataset; dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER')`. Same API as PATTERN.

What accuracy should I expect on CLUSTER?

GCN achieves ~68.5%, GIN ~64.7%, GAT ~70.6%, GPS ~76.1%. Unlike PATTERN, GAT outperforms GIN here because attention helps identify community boundaries by weighting intra-community edges higher than inter-community ones.

Why does GIN underperform GAT on CLUSTER?

Community detection benefits from attention: nodes at community boundaries need to selectively attend to intra-community neighbors. GIN's sum aggregation treats all neighbors equally, which is effective for substructure counting (PATTERN) but suboptimal for community identification.

CLUSTER Dataset: GNN Benchmark for Graph Clustering | PyG Guide

12,000

Graphs

~117

Avg Nodes

Node Features

Classes

What CLUSTER contains

CLUSTER is a synthetic benchmark from the “Benchmarking Graph Neural Networks” paper. Each of the 12,000 graphs is generated from a stochastic block model with 6 communities. Nodes within the same community are densely connected; connections between communities are sparser. Each node has 7-dimensional features and a community label (0-5). The task is to correctly assign each node to its community.

Graphs average 117 nodes and 4,304 edges. The community structure varies in difficulty across graphs: some have clearly separated clusters while others have more overlap, testing model robustness across varying difficulty levels.

Why CLUSTER matters

Community detection is one of the oldest and most important graph analysis tasks. It appears throughout industry: customer segmentation (find groups of similar customers), social network analysis (identify communities of interest), biological network analysis (find functional modules in PPI networks), and organizational analytics (detect team structures from communication patterns).

CLUSTER provides a controlled benchmark for this task. The key finding: attention mechanisms (GAT, GPS) significantly outperform fixed aggregation (GIN, GCN) on community detection. This makes intuitive sense -- identifying community boundaries requires weighting intra-community edges more heavily than inter-community ones, which is exactly what attention learns to do.

Loading CLUSTER in PyG

load_cluster.py

from torch_geometric.datasets import GNNBenchmarkDataset

train_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='test')

print(f"Train: {len(train_dataset)}")   # 10000
print(f"Val: {len(val_dataset)}")       # 1000
print(f"Test: {len(test_dataset)}")     # 1000

Same GNNBenchmarkDataset API as PATTERN. Per-node 6-class classification.

Common tasks and benchmarks

Per-node 6-class classification (community assignment) with weighted accuracy. GCN: 68.5%, GIN: 64.7%, GAT: 70.6%, GraphSage: 63.8%, PNA: 76.0%, GPS: 76.1%. The standout result: GIN underperforms GCN, the opposite of PATTERN. This shows that no single aggregation strategy dominates all structural tasks. GPS performs best because it combines local message passing with global attention.

Example: customer segmentation

An e-commerce company wants to segment customers into groups for targeted marketing. The customer interaction graph (who buys similar products, who reviews the same items) naturally forms communities. High-value customers cluster with other high-value customers. Bargain hunters form their own group. GNN-based community detection identifies these segments from the interaction structure, enabling personalized marketing without manual segment definition.

Published benchmark results

Per-node 6-class classification with weighted accuracy on CLUSTER. ~500K parameter budget. Higher is better.

Method	Weighted Acc (%)	Year	Paper
GCN	68.5	2020	Dwivedi et al.
GAT	70.6	2020	Dwivedi et al.
GraphSage	63.8	2020	Dwivedi et al.
GIN	64.7	2020	Dwivedi et al.
GatedGCN	73.8	2020	Dwivedi et al.
PNA	76.0	2020	Corso et al.
SAN	76.7	2022	Kreuzer et al.
GPS	76.1	2022	Rampasek et al.

Original Paper

Benchmarking Graph Neural Networks

V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research

Read paper →

Original data source

CLUSTER is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.

cite_cluster.bib

@article{dwivedi2023benchmarking,
  title={Benchmarking Graph Neural Networks},
  author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={43},
  pages={1--48},
  year={2023}
}

BibTeX citation for the Benchmarking GNNs paper (CLUSTER dataset).

Which dataset should I use?

CLUSTER vs PATTERN: Both are from Benchmarking GNNs but test different skills. CLUSTER tests community detection (where attention helps). PATTERN tests sub-pattern detection (where sum aggregation helps). Run both to understand your model.

CLUSTER vs Karate Club: Karate Club has 34 nodes and is for tutorials only. CLUSTER has 12,000 graphs and is for rigorous benchmarking of community detection capabilities.

CLUSTER vs Cora: CLUSTER is synthetic with controlled community structure. Cora is real-world with organic community patterns. CLUSTER isolates community detection ability; Cora conflates multiple learning signals.

From benchmark to production

Production community detection handles overlapping communities (a customer can belong to multiple segments), hierarchical structure (segments contain sub-segments), and dynamic evolution (communities form and dissolve over time). CLUSTER's non-overlapping, static communities are a simplified starting point.

CLUSTER: Can Your GNN Detect Communities?