130,831
Molecules
~18
Avg Atoms
11
Node Features
19 (regression)
Targets
What QM9 contains
QM9 contains 130,831 small organic molecules composed of hydrogen, carbon, nitrogen, oxygen, and fluorine with up to 9 heavy atoms. Each molecule is represented as a graph where atoms are nodes (11-dimensional features encoding atom type, hybridization, etc.) and bonds are edges. Crucially, each atom also has 3D coordinates, enabling geometric deep learning models.
The 19 regression targets are quantum chemical properties computed by density functional theory (DFT): dipole moment, polarizability, HOMO energy, LUMO energy, HOMO-LUMO gap, electronic spatial extent, zero-point vibrational energy, internal energies, enthalpy, free energy, and heat capacity. These properties determine a molecule's chemical behavior and are expensive to compute from first principles.
Why QM9 matters
QM9 is where geometric deep learning proved its value for science. Computing one molecule's quantum properties via DFT takes minutes to hours of CPU time. A trained GNN predicts the same properties in milliseconds. On QM9's 130K molecules, this speedup is nice. On the billions of possible drug candidates that pharmaceutical companies screen, it is transformative: weeks of computation become seconds.
QM9 also demonstrated that 3D geometry matters. Standard GNN layers (GCN, GIN) process only the 2D molecular graph (which atoms are bonded to which). 3D-aware models (SchNet, DimeNet, EGNN) also use inter-atomic distances and angles, achieving dramatically better results. This insight -- that spatial structure contains critical information -- has influenced GNN design far beyond chemistry.
Loading QM9 in PyG
from torch_geometric.datasets import QM9
dataset = QM9(root='/tmp/QM9')
print(f"Molecules: {len(dataset)}") # 130831
mol = dataset[0]
print(f"Atoms: {mol.num_nodes}") # varies per molecule
print(f"Bonds: {mol.num_edges}") # varies per molecule
print(f"Features: {mol.x.shape[1]}") # 11
print(f"3D coords: {mol.pos.shape}") # [N, 3]
print(f"Targets: {mol.y.shape}") # [1, 19]mol.pos gives 3D coordinates. mol.y has all 19 targets. Select one target for training.
Common tasks and benchmarks
Per-target regression evaluated by mean absolute error (MAE). Each of the 19 targets is typically trained and evaluated independently. The standard split uses 110K/10K/10K for train/val/test. 2D models (GCN, GIN) achieve moderate MAE. 3D models (SchNet, DimeNet++, PaiNN, EGNN) achieve chemical accuracy on many targets, meaning their predictions are within the uncertainty of experimental measurements. The gap between 2D and 3D models is often 2-5x in MAE.
Example: drug candidate screening
A pharmaceutical company needs to screen 10 million candidate molecules for drug-likeness. Key properties include solubility (related to free energy), reactivity (related to HOMO-LUMO gap), and stability (related to heat of formation). Running DFT on 10 million molecules would take thousands of CPU-years. A GNN trained on QM9-scale data predicts these properties in hours, enabling rapid filtering of candidates before expensive wet-lab testing.
Published benchmark results
Selected state-of-the-art results on QM9 for the HOMO-LUMO gap target (target index 4), reported as mean absolute error (MAE) in meV. Lower is better.
| Method | MAE (meV) | Year | Paper |
|---|---|---|---|
| SchNet | 63 | 2017 | Schutt et al. |
| DimeNet | 32 | 2020 | Gasteiger et al. |
| DimeNet++ | 24.6 | 2020 | Gasteiger et al. |
| PaiNN | 45.7 | 2021 | Schutt et al. |
| EGNN | 48 | 2021 | Satorras et al. |
| SphereNet | 23.6 | 2022 | Liu et al. |
| Equiformer | ~22 | 2023 | Liao & Smidt |
Original Paper
Quantum-Chemical Insights from Deep Tensor Neural Networks
K. T. Schutt, F. Arbabzadah, S. Chmiela, K. R. Muller, A. Tkatchenko (2017). Nature Communications
Read paper →Original data source
The QM9 dataset was originally compiled by Ramakrishnan et al. (2014) and is available for download from Figshare. The original molecules are from the GDB-17 chemical universe database.
@article{ramakrishnan2014quantum,
title={Quantum chemistry structures and properties of 134 kilo molecules},
author={Ramakrishnan, Raghunathan and Dral, Pavlo O and Rupp, Matthias and von Lilienfeld, O Anatole},
journal={Scientific Data},
volume={1},
pages={140022},
year={2014},
publisher={Nature Publishing Group}
}BibTeX citation for the QM9 dataset.
Which dataset should I use?
QM9 vs ZINC: QM9 is for quantum property regression with rich 3D features (11D node features + coordinates). ZINC tests GNN expressiveness with minimal features (1 categorical). Use QM9 for molecular property prediction research; use ZINC to benchmark architecture expressiveness.
QM9 vs Peptides-func: QM9 has small molecules (~18 atoms), Peptides-func has larger peptides (~151 atoms) requiring long-range reasoning. QM9 tests local 3D geometry; Peptides-func tests global structure capture.
QM9 vs MUTAG: QM9 is 700x larger and is a regression task with 19 targets. MUTAG is a tiny (188 graphs) binary classification dataset. Use MUTAG only for hello-world tutorials.
From benchmark to production
Production molecular property prediction extends beyond QM9 in several ways: larger molecules (QM9 is limited to 9 heavy atoms; drugs typically have 20-50), additional property types (binding affinity, toxicity, metabolic stability), and integration with molecular generation (design molecules with desired properties, then predict their properties). But the core pipeline -- graph representation of molecules, GNN message passing, property prediction -- remains the same.