Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset6 min read

CLUSTER: Can Your GNN Detect Communities?

CLUSTER is a synthetic benchmark of 12,000 graphs generated from stochastic block models with 6 communities each. It tests community detection -- the ability to identify groups of densely connected nodes -- and reveals that attention mechanisms matter for this task.

PyTorch Geometric

TL;DR

  • 1CLUSTER has 12,000 graphs averaging 117 nodes and 4,304 edges. Nodes have 7 features and belong to one of 6 communities.
  • 2The task is community detection: assign each node to its correct community cluster. This tests whether GNNs can identify groups of densely connected nodes.
  • 3GAT (70.6%) outperforms GIN (64.7%) because attention weights help distinguish intra-community edges from inter-community ones.
  • 4Community detection is fundamental to customer segmentation, social network analysis, and organizational analytics.

12,000

Graphs

~117

Avg Nodes

7

Node Features

6

Classes

What CLUSTER contains

CLUSTER is a synthetic benchmark from the “Benchmarking Graph Neural Networks” paper. Each of the 12,000 graphs is generated from a stochastic block model with 6 communities. Nodes within the same community are densely connected; connections between communities are sparser. Each node has 7-dimensional features and a community label (0-5). The task is to correctly assign each node to its community.

Graphs average 117 nodes and 4,304 edges. The community structure varies in difficulty across graphs: some have clearly separated clusters while others have more overlap, testing model robustness across varying difficulty levels.

Why CLUSTER matters

Community detection is one of the oldest and most important graph analysis tasks. It appears throughout industry: customer segmentation (find groups of similar customers), social network analysis (identify communities of interest), biological network analysis (find functional modules in PPI networks), and organizational analytics (detect team structures from communication patterns).

CLUSTER provides a controlled benchmark for this task. The key finding: attention mechanisms (GAT, GPS) significantly outperform fixed aggregation (GIN, GCN) on community detection. This makes intuitive sense -- identifying community boundaries requires weighting intra-community edges more heavily than inter-community ones, which is exactly what attention learns to do.

Loading CLUSTER in PyG

load_cluster.py
from torch_geometric.datasets import GNNBenchmarkDataset

train_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER', split='test')

print(f"Train: {len(train_dataset)}")   # 10000
print(f"Val: {len(val_dataset)}")       # 1000
print(f"Test: {len(test_dataset)}")     # 1000

Same GNNBenchmarkDataset API as PATTERN. Per-node 6-class classification.

Common tasks and benchmarks

Per-node 6-class classification (community assignment) with weighted accuracy. GCN: 68.5%, GIN: 64.7%, GAT: 70.6%, GraphSage: 63.8%, PNA: 76.0%, GPS: 76.1%. The standout result: GIN underperforms GCN, the opposite of PATTERN. This shows that no single aggregation strategy dominates all structural tasks. GPS performs best because it combines local message passing with global attention.

Example: customer segmentation

An e-commerce company wants to segment customers into groups for targeted marketing. The customer interaction graph (who buys similar products, who reviews the same items) naturally forms communities. High-value customers cluster with other high-value customers. Bargain hunters form their own group. GNN-based community detection identifies these segments from the interaction structure, enabling personalized marketing without manual segment definition.

Published benchmark results

Per-node 6-class classification with weighted accuracy on CLUSTER. ~500K parameter budget. Higher is better.

MethodWeighted Acc (%)YearPaper
GCN68.52020Dwivedi et al.
GAT70.62020Dwivedi et al.
GraphSage63.82020Dwivedi et al.
GIN64.72020Dwivedi et al.
GatedGCN73.82020Dwivedi et al.
PNA76.02020Corso et al.
SAN76.72022Kreuzer et al.
GPS76.12022Rampasek et al.

Original Paper

Benchmarking Graph Neural Networks

V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research

Read paper →

Original data source

CLUSTER is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.

cite_cluster.bib
@article{dwivedi2023benchmarking,
  title={Benchmarking Graph Neural Networks},
  author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={43},
  pages={1--48},
  year={2023}
}

BibTeX citation for the Benchmarking GNNs paper (CLUSTER dataset).

Which dataset should I use?

CLUSTER vs PATTERN: Both are from Benchmarking GNNs but test different skills. CLUSTER tests community detection (where attention helps). PATTERN tests sub-pattern detection (where sum aggregation helps). Run both to understand your model.

CLUSTER vs Karate Club: Karate Club has 34 nodes and is for tutorials only. CLUSTER has 12,000 graphs and is for rigorous benchmarking of community detection capabilities.

CLUSTER vs Cora: CLUSTER is synthetic with controlled community structure. Cora is real-world with organic community patterns. CLUSTER isolates community detection ability; Cora conflates multiple learning signals.

From benchmark to production

Production community detection handles overlapping communities (a customer can belong to multiple segments), hierarchical structure (segments contain sub-segments), and dynamic evolution (communities form and dissolve over time). CLUSTER's non-overlapping, static communities are a simplified starting point.

Frequently asked questions

What is the CLUSTER dataset?

CLUSTER is a synthetic benchmark from the 'Benchmarking GNNs' paper. It contains 12,000 graphs averaging 117 nodes, 4,304 edges, 7 features, and 6 cluster labels. The task is to assign each node to its correct community in a stochastic block model graph.

How does CLUSTER differ from PATTERN?

PATTERN is binary (pattern vs background) while CLUSTER is 6-class (which community). CLUSTER tests community detection ability -- identifying groups of densely connected nodes. PATTERN tests sub-pattern detection. Both are from the same Benchmarking GNNs suite.

How do I load CLUSTER in PyTorch Geometric?

Use `from torch_geometric.datasets import GNNBenchmarkDataset; dataset = GNNBenchmarkDataset(root='/tmp/CLUSTER', name='CLUSTER')`. Same API as PATTERN.

What accuracy should I expect on CLUSTER?

GCN achieves ~68.5%, GIN ~64.7%, GAT ~70.6%, GPS ~76.1%. Unlike PATTERN, GAT outperforms GIN here because attention helps identify community boundaries by weighting intra-community edges higher than inter-community ones.

Why does GIN underperform GAT on CLUSTER?

Community detection benefits from attention: nodes at community boundaries need to selectively attend to intra-community neighbors. GIN's sum aggregation treats all neighbors equally, which is effective for substructure counting (PATTERN) but suboptimal for community identification.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.