600
Graphs
~33
Avg Nodes
3
Node Features
6
Classes
What ENZYMES contains
ENZYMES contains 600 protein tertiary structure graphs from the BRENDA enzyme database. Each graph represents one enzyme. Nodes are secondary structure elements (helices, sheets, turns) with 3 node attributes by default (6 with use_node_attr=True, adding chemical and physical properties). Edges connect spatially or sequentially adjacent elements, averaging ~62 bonds (~124 directed edges in PyG) per graph. The 6 classes correspond to the top-level EC (Enzyme Commission) numbers: Oxidoreductases, Transferases, Hydrolases, Lyases, Isomerases, and Ligases.
Why ENZYMES matters
ENZYMES tests GNN expressiveness. On MUTAG, even simple GCN achieves 85%+. On ENZYMES, the same architecture drops to ~45%. The difficulty comes from the task itself: different enzyme types can have similar overall structures but differ in specific local motifs (active sites, binding pockets). Detecting these fine-grained structural patterns requires more expressive GNN layers.
This makes ENZYMES a valuable diagnostic: if your GNN improvement helps on MUTAG but not on ENZYMES, it may be capturing easy patterns while missing harder structural features. GIN (Graph Isomorphism Network) outperforms GCN on ENZYMES by a wider margin than on MUTAG, confirming that expressiveness matters more on harder tasks.
Loading ENZYMES in PyG
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader
dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
print(f"Graphs: {len(dataset)}") # 600
print(f"Features: {dataset.num_features}") # 3
print(f"Classes: {dataset.num_classes}") # 6
loader = DataLoader(dataset, batch_size=32, shuffle=True)Standard TUDataset API. Use 10-fold cross-validation for evaluation.
Original Paper
Protein Function Prediction via Graph Kernels
Karsten M. Borgwardt, Cheng Soon Ong, Stefan Schonauer, S.V.N. Vishwanathan, Alex J. Smola, Hans-Peter Kriegel (2005). Bioinformatics, 21(Suppl 1), i47-i56
Benchmark comparison (10-fold cross-validation)
| Method | Accuracy | Year | Paper |
|---|---|---|---|
| WL kernel | ~53.2% | 2011 | Shervashidze et al. |
| GCN + mean pool | ~44.8% | 2017 | Kipf & Welling |
| GIN | ~59.6% | 2019 | Xu et al. |
| PNA | ~62.5% | 2020 | Corso et al. |
| CIN | ~66.0% | 2021 | Bodnar et al. |
Which graph classification dataset should I use?
MUTAG (188 graphs, 2 classes) is the quickest sanity check -- if your model fails here, the code is broken. ENZYMES (600 graphs, 6 classes) is the hardest of the three TUDataset classics (~50% accuracy) and tests GNN expressiveness. PROTEINS (1,113 graphs, 2 classes) has the most graphs and gives more reliable statistical estimates. Run all three: if your method helps on MUTAG but not ENZYMES, it may only capture easy patterns.
Common tasks and benchmarks
6-class graph classification with 10-fold cross-validation. GCN with global mean pooling: ~40-50%. GIN: ~50-60%. PNA (Principal Neighbourhood Aggregation): ~55-65%. The variance is high even with 10-fold CV due to the small dataset size. Methods that achieve above 65% typically use data augmentation or pretraining strategies.
Data source
ENZYMES is part of the TUDataset collection and originally from the BRENDA enzyme database. The graph version is available from the TUDataset benchmark suite. PyG downloads it automatically.
BibTeX citation
@article{borgwardt2005protein,
title={Protein Function Prediction via Graph Kernels},
author={Borgwardt, Karsten M. and Ong, Cheng Soon and Sch{\"o}nauer, Stefan and Vishwanathan, S.V.N. and Smola, Alex J. and Kriegel, Hans-Peter},
journal={Bioinformatics},
volume={21},
number={Suppl 1},
pages={i47--i56},
year={2005}
}
@article{morris2020tudataset,
title={TUDataset: A collection of benchmark datasets for learning with graphs},
author={Morris, Christopher and Kriege, Nils M. and Bause, Franka and Kersting, Kristian and Mutzel, Petra and Neumann, Marion},
journal={arXiv preprint arXiv:2007.08663},
year={2020}
}Cite Borgwardt et al. for the ENZYMES dataset, Morris et al. for the TUDataset collection.
Example: protein function prediction
Biotech companies need to predict enzyme function from structure to design better catalysts for industrial processes. An enzyme that efficiently breaks down plastic waste could be worth billions. GNN-based function prediction accelerates this search by screening candidate protein structures computationally before expensive laboratory synthesis. ENZYMES benchmarks the core classification capability underlying these industrial applications.
From benchmark to production
Production protein function prediction uses 3D coordinates, amino acid sequences, evolutionary information (multiple sequence alignments), and datasets thousands of times larger than ENZYMES. AlphaFold and ESMFold have transformed the field by predicting structure from sequence. The next frontier is predicting function from predicted structure -- where graph neural networks play a central role.