14,000
Graphs
~119
Avg Nodes
3
Node Features
2
Classes
What PATTERN contains
PATTERN is a synthetic benchmark from the influential “Benchmarking Graph Neural Networks” paper (Dwivedi et al., 2020). Each of the 14,000 graphs is generated from a stochastic block model with a planted sub-pattern. Some nodes belong to the pattern (label 1) and the rest are background (label 0). Nodes have 3-dimensional features that provide minimal information -- the model must rely on graph structure to detect the planted pattern.
Graphs average 119 nodes and 6,099 edges, making them medium-sized. The 14,000-graph dataset provides enough statistical power for reliable evaluation, unlike smaller molecular benchmarks.
Why PATTERN matters
On real datasets, it is hard to know whether a GNN is learning from features, structure, or some combination. PATTERN eliminates this ambiguity: the 3 node features are intentionally uninformative for the task. The only way to classify nodes correctly is to detect the structural pattern in the graph connectivity.
The results are illuminating. GCN (simple averaging) achieves only 68.9%. GIN (provably more expressive) jumps to 85.6%. GPS (graph transformer with global attention) reaches 86.7%. These gaps are much larger than on real datasets, confirming that expressiveness differences are often masked by informative features.
Loading PATTERN in PyG
from torch_geometric.datasets import GNNBenchmarkDataset
train_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='test')
print(f"Train: {len(train_dataset)}") # 10000
print(f"Val: {len(val_dataset)}") # 2000
print(f"Test: {len(test_dataset)}") # 2000Standard train/val/test splits. Evaluate per-node binary classification with weighted accuracy.
Common tasks and benchmarks
Per-node binary classification with weighted accuracy (accounting for class imbalance). GCN: 68.9%, GIN: 85.6%, GAT: 78.3%, GraphSage: 50.5%, PNA: 85.5%, GPS: 86.7%. The poor performance of GraphSAGE (~50%, near random) and strong performance of GIN and GPS demonstrates that the specific aggregation function matters enormously for structural tasks.
Example: anomaly detection in networks
PATTERN's task -- finding planted structures in random backgrounds -- maps directly to network anomaly detection. In cybersecurity, planted patterns are botnet command-and-control structures hidden in normal traffic. In financial networks, planted patterns are fraud rings embedded in legitimate transaction flows. Detecting these structural anomalies requires the same pattern recognition PATTERN benchmarks.
Published benchmark results
Per-node binary classification with weighted accuracy on PATTERN. ~500K parameter budget. Higher is better.
| Method | Weighted Acc (%) | Year | Paper |
|---|---|---|---|
| GCN | 68.9 | 2020 | Dwivedi et al. |
| GAT | 78.3 | 2020 | Dwivedi et al. |
| GraphSage | 50.5 | 2020 | Dwivedi et al. |
| GIN | 85.6 | 2020 | Dwivedi et al. |
| GatedGCN | 85.6 | 2020 | Dwivedi et al. |
| PNA | 85.5 | 2020 | Corso et al. |
| SAN | 86.6 | 2022 | Kreuzer et al. |
| GPS | 86.7 | 2022 | Rampasek et al. |
Original Paper
Benchmarking Graph Neural Networks
V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research
Read paper →Original data source
PATTERN is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.
@article{dwivedi2023benchmarking,
title={Benchmarking Graph Neural Networks},
author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
journal={Journal of Machine Learning Research},
volume={24},
number={43},
pages={1--48},
year={2023}
}BibTeX citation for the Benchmarking GNNs paper (PATTERN dataset).
Which dataset should I use?
PATTERN vs CLUSTER: Both are from Benchmarking GNNs. PATTERN tests sub-pattern detection (binary). CLUSTER tests community detection (6-class). Different structural tasks with different model rankings (GIN excels on PATTERN, GAT excels on CLUSTER).
PATTERN vs ZINC: PATTERN is node-level classification on synthetic graphs. ZINC is graph-level regression on molecules. Both test expressiveness but in different settings.
PATTERN vs Cora: PATTERN isolates structural learning (uninformative features). Cora mixes feature and structural signals. Use PATTERN to diagnose whether your GNN learns from structure.
From benchmark to production
Production structural anomaly detection operates on graphs with millions of nodes, dynamic connectivity (patterns form and dissolve), and adversarial actors (fraudsters deliberately disguise their network patterns). PATTERN's static, synthetic setting is a starting point. Production systems need temporal modeling, robustness to adversarial perturbation, and real-time processing.