What is the PATTERN dataset?

PATTERN is a synthetic benchmark from the 'Benchmarking GNNs' paper. It contains 14,000 graphs averaging 119 nodes and 6,099 edges each. Nodes have 3 features. The binary task is to identify nodes that belong to a planted sub-pattern within a larger random graph. It tests structural pattern recognition.

How was PATTERN generated?

Each graph is a stochastic block model with a planted sub-pattern. Some nodes belong to the pattern (label 1) and others are background (label 0). The model must distinguish pattern nodes from background using structural cues, not just features. This isolates the GNN's structural learning ability.

How do I load PATTERN in PyTorch Geometric?

Use `from torch_geometric.datasets import GNNBenchmarkDataset; dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN')`. Separate train/val/test splits are available.

What accuracy should I expect on PATTERN?

GCN achieves ~68.9% weighted accuracy, GIN ~85.6%, GAT ~78.3%, GPS (graph transformer) ~86.7%. The large gap between GCN and more expressive methods shows that structural pattern detection requires expressiveness beyond simple averaging.

Why use synthetic benchmarks instead of real datasets?

Synthetic benchmarks like PATTERN provide controlled experiments where the ground truth generative process is known. This enables precise diagnosis of what GNN architectures can and cannot detect. Real datasets conflate multiple challenges; PATTERN isolates structural pattern recognition.

PATTERN Dataset: GNN Benchmark for Node-Level Pattern Recognition | PyG Guide

14,000

Graphs

~119

Avg Nodes

Node Features

Classes

What PATTERN contains

PATTERN is a synthetic benchmark from the influential “Benchmarking Graph Neural Networks” paper (Dwivedi et al., 2020). Each of the 14,000 graphs is generated from a stochastic block model with a planted sub-pattern. Some nodes belong to the pattern (label 1) and the rest are background (label 0). Nodes have 3-dimensional features that provide minimal information -- the model must rely on graph structure to detect the planted pattern.

Graphs average 119 nodes and 6,099 edges, making them medium-sized. The 14,000-graph dataset provides enough statistical power for reliable evaluation, unlike smaller molecular benchmarks.

Why PATTERN matters

On real datasets, it is hard to know whether a GNN is learning from features, structure, or some combination. PATTERN eliminates this ambiguity: the 3 node features are intentionally uninformative for the task. The only way to classify nodes correctly is to detect the structural pattern in the graph connectivity.

The results are illuminating. GCN (simple averaging) achieves only 68.9%. GIN (provably more expressive) jumps to 85.6%. GPS (graph transformer with global attention) reaches 86.7%. These gaps are much larger than on real datasets, confirming that expressiveness differences are often masked by informative features.

Loading PATTERN in PyG

load_pattern.py

from torch_geometric.datasets import GNNBenchmarkDataset

train_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='test')

print(f"Train: {len(train_dataset)}")   # 10000
print(f"Val: {len(val_dataset)}")       # 2000
print(f"Test: {len(test_dataset)}")     # 2000

Standard train/val/test splits. Evaluate per-node binary classification with weighted accuracy.

Common tasks and benchmarks

Per-node binary classification with weighted accuracy (accounting for class imbalance). GCN: 68.9%, GIN: 85.6%, GAT: 78.3%, GraphSage: 50.5%, PNA: 85.5%, GPS: 86.7%. The poor performance of GraphSAGE (~50%, near random) and strong performance of GIN and GPS demonstrates that the specific aggregation function matters enormously for structural tasks.

Example: anomaly detection in networks

PATTERN's task -- finding planted structures in random backgrounds -- maps directly to network anomaly detection. In cybersecurity, planted patterns are botnet command-and-control structures hidden in normal traffic. In financial networks, planted patterns are fraud rings embedded in legitimate transaction flows. Detecting these structural anomalies requires the same pattern recognition PATTERN benchmarks.

Published benchmark results

Per-node binary classification with weighted accuracy on PATTERN. ~500K parameter budget. Higher is better.

Method	Weighted Acc (%)	Year	Paper
GCN	68.9	2020	Dwivedi et al.
GAT	78.3	2020	Dwivedi et al.
GraphSage	50.5	2020	Dwivedi et al.
GIN	85.6	2020	Dwivedi et al.
GatedGCN	85.6	2020	Dwivedi et al.
PNA	85.5	2020	Corso et al.
SAN	86.6	2022	Kreuzer et al.
GPS	86.7	2022	Rampasek et al.

Original Paper

Benchmarking Graph Neural Networks

V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research

Read paper →

Original data source

PATTERN is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.

cite_pattern.bib

@article{dwivedi2023benchmarking,
  title={Benchmarking Graph Neural Networks},
  author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={43},
  pages={1--48},
  year={2023}
}

BibTeX citation for the Benchmarking GNNs paper (PATTERN dataset).

Which dataset should I use?

PATTERN vs CLUSTER: Both are from Benchmarking GNNs. PATTERN tests sub-pattern detection (binary). CLUSTER tests community detection (6-class). Different structural tasks with different model rankings (GIN excels on PATTERN, GAT excels on CLUSTER).

PATTERN vs ZINC: PATTERN is node-level classification on synthetic graphs. ZINC is graph-level regression on molecules. Both test expressiveness but in different settings.

PATTERN vs Cora: PATTERN isolates structural learning (uninformative features). Cora mixes feature and structural signals. Use PATTERN to diagnose whether your GNN learns from structure.

From benchmark to production

Production structural anomaly detection operates on graphs with millions of nodes, dynamic connectivity (patterns form and dissolve), and adversarial actors (fraudsters deliberately disguise their network patterns). PATTERN's static, synthetic setting is a starting point. Production systems need temporal modeling, robustness to adversarial perturbation, and real-time processing.

PATTERN: Testing Whether Your GNN Can Actually Detect Structural Patterns