Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset6 min read

PATTERN: Testing Whether Your GNN Can Actually Detect Structural Patterns

PATTERN is a synthetic benchmark of 14,000 graphs where the task is to find planted sub-patterns in random graphs. It provides a controlled test of GNN structural expressiveness: can your model detect connectivity patterns, or does it only learn from features?

PyTorch Geometric

TL;DR

  • 1PATTERN has 14,000 graphs averaging 119 nodes and 6,099 edges. Nodes have 3 features and binary labels (pattern vs. background).
  • 2The task: identify nodes belonging to a planted structural pattern within a random graph. This tests pure structural pattern recognition.
  • 3GCN: 68.9%, GIN: 85.6%, GPS: 86.7%. The 18-point gap between GCN and GPS proves that expressiveness matters for structural tasks.
  • 4Synthetic benchmarks provide controlled diagnosis of GNN capabilities. PATTERN isolates structural detection from feature-based classification.

14,000

Graphs

~119

Avg Nodes

3

Node Features

2

Classes

What PATTERN contains

PATTERN is a synthetic benchmark from the influential “Benchmarking Graph Neural Networks” paper (Dwivedi et al., 2020). Each of the 14,000 graphs is generated from a stochastic block model with a planted sub-pattern. Some nodes belong to the pattern (label 1) and the rest are background (label 0). Nodes have 3-dimensional features that provide minimal information -- the model must rely on graph structure to detect the planted pattern.

Graphs average 119 nodes and 6,099 edges, making them medium-sized. The 14,000-graph dataset provides enough statistical power for reliable evaluation, unlike smaller molecular benchmarks.

Why PATTERN matters

On real datasets, it is hard to know whether a GNN is learning from features, structure, or some combination. PATTERN eliminates this ambiguity: the 3 node features are intentionally uninformative for the task. The only way to classify nodes correctly is to detect the structural pattern in the graph connectivity.

The results are illuminating. GCN (simple averaging) achieves only 68.9%. GIN (provably more expressive) jumps to 85.6%. GPS (graph transformer with global attention) reaches 86.7%. These gaps are much larger than on real datasets, confirming that expressiveness differences are often masked by informative features.

Loading PATTERN in PyG

load_pattern.py
from torch_geometric.datasets import GNNBenchmarkDataset

train_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='train')
val_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='val')
test_dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN', split='test')

print(f"Train: {len(train_dataset)}")   # 10000
print(f"Val: {len(val_dataset)}")       # 2000
print(f"Test: {len(test_dataset)}")     # 2000

Standard train/val/test splits. Evaluate per-node binary classification with weighted accuracy.

Common tasks and benchmarks

Per-node binary classification with weighted accuracy (accounting for class imbalance). GCN: 68.9%, GIN: 85.6%, GAT: 78.3%, GraphSage: 50.5%, PNA: 85.5%, GPS: 86.7%. The poor performance of GraphSAGE (~50%, near random) and strong performance of GIN and GPS demonstrates that the specific aggregation function matters enormously for structural tasks.

Example: anomaly detection in networks

PATTERN's task -- finding planted structures in random backgrounds -- maps directly to network anomaly detection. In cybersecurity, planted patterns are botnet command-and-control structures hidden in normal traffic. In financial networks, planted patterns are fraud rings embedded in legitimate transaction flows. Detecting these structural anomalies requires the same pattern recognition PATTERN benchmarks.

Published benchmark results

Per-node binary classification with weighted accuracy on PATTERN. ~500K parameter budget. Higher is better.

MethodWeighted Acc (%)YearPaper
GCN68.92020Dwivedi et al.
GAT78.32020Dwivedi et al.
GraphSage50.52020Dwivedi et al.
GIN85.62020Dwivedi et al.
GatedGCN85.62020Dwivedi et al.
PNA85.52020Corso et al.
SAN86.62022Kreuzer et al.
GPS86.72022Rampasek et al.

Original Paper

Benchmarking Graph Neural Networks

V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson (2023). Journal of Machine Learning Research

Read paper →

Original data source

PATTERN is generated synthetically using stochastic block models. The generation code and pre-generated splits are available from the benchmarking-gnns repository.

cite_pattern.bib
@article{dwivedi2023benchmarking,
  title={Benchmarking Graph Neural Networks},
  author={Dwivedi, Vijay Prakash and Joshi, Chaitanya K and Luu, Anh Tuan and Laurent, Thomas and Bengio, Yoshua and Bresson, Xavier},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={43},
  pages={1--48},
  year={2023}
}

BibTeX citation for the Benchmarking GNNs paper (PATTERN dataset).

Which dataset should I use?

PATTERN vs CLUSTER: Both are from Benchmarking GNNs. PATTERN tests sub-pattern detection (binary). CLUSTER tests community detection (6-class). Different structural tasks with different model rankings (GIN excels on PATTERN, GAT excels on CLUSTER).

PATTERN vs ZINC: PATTERN is node-level classification on synthetic graphs. ZINC is graph-level regression on molecules. Both test expressiveness but in different settings.

PATTERN vs Cora: PATTERN isolates structural learning (uninformative features). Cora mixes feature and structural signals. Use PATTERN to diagnose whether your GNN learns from structure.

From benchmark to production

Production structural anomaly detection operates on graphs with millions of nodes, dynamic connectivity (patterns form and dissolve), and adversarial actors (fraudsters deliberately disguise their network patterns). PATTERN's static, synthetic setting is a starting point. Production systems need temporal modeling, robustness to adversarial perturbation, and real-time processing.

Frequently asked questions

What is the PATTERN dataset?

PATTERN is a synthetic benchmark from the 'Benchmarking GNNs' paper. It contains 14,000 graphs averaging 119 nodes and 6,099 edges each. Nodes have 3 features. The binary task is to identify nodes that belong to a planted sub-pattern within a larger random graph. It tests structural pattern recognition.

How was PATTERN generated?

Each graph is a stochastic block model with a planted sub-pattern. Some nodes belong to the pattern (label 1) and others are background (label 0). The model must distinguish pattern nodes from background using structural cues, not just features. This isolates the GNN's structural learning ability.

How do I load PATTERN in PyTorch Geometric?

Use `from torch_geometric.datasets import GNNBenchmarkDataset; dataset = GNNBenchmarkDataset(root='/tmp/PATTERN', name='PATTERN')`. Separate train/val/test splits are available.

What accuracy should I expect on PATTERN?

GCN achieves ~68.9% weighted accuracy, GIN ~85.6%, GAT ~78.3%, GPS (graph transformer) ~86.7%. The large gap between GCN and more expressive methods shows that structural pattern detection requires expressiveness beyond simple averaging.

Why use synthetic benchmarks instead of real datasets?

Synthetic benchmarks like PATTERN provide controlled experiments where the ground truth generative process is known. This enables precise diagnosis of what GNN architectures can and cannot detect. Real datasets conflate multiple challenges; PATTERN isolates structural pattern recognition.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.