Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Dataset8 min read

QM9: 130K Molecules and the Gold Standard for Molecular Property Prediction

QM9 is a dataset of 130,831 small organic molecules with 19 quantum chemical properties computed by density functional theory. It is the benchmark that launched geometric deep learning for chemistry: predicting physical properties from molecular structure without running expensive quantum simulations.

PyTorch Geometric

TL;DR

  • 1QM9 has 130,831 molecules with up to 9 heavy atoms. Each molecule has 11 node features and 19 regression targets (quantum chemical properties like HOMO-LUMO gap, dipole moment, etc.).
  • 2This is a regression benchmark, not classification. Models predict continuous physical properties from molecular graphs. Mean absolute error (MAE) is the standard metric.
  • 33D-aware models (SchNet, DimeNet, EGNN) significantly outperform 2D graph models (GCN, GIN). Physical symmetries and spatial coordinates are essential for accurate property prediction.
  • 4QM9 drives drug discovery and materials science: predicting molecular properties from structure eliminates expensive quantum chemistry computations.
  • 5KumoRFM applies the same property-from-structure reasoning to enterprise data: predicting business outcomes from relational structure.

130,831

Molecules

~18

Avg Atoms

11

Node Features

19 (regression)

Targets

What QM9 contains

QM9 contains 130,831 small organic molecules composed of hydrogen, carbon, nitrogen, oxygen, and fluorine with up to 9 heavy atoms. Each molecule is represented as a graph where atoms are nodes (11-dimensional features encoding atom type, hybridization, etc.) and bonds are edges. Crucially, each atom also has 3D coordinates, enabling geometric deep learning models.

The 19 regression targets are quantum chemical properties computed by density functional theory (DFT): dipole moment, polarizability, HOMO energy, LUMO energy, HOMO-LUMO gap, electronic spatial extent, zero-point vibrational energy, internal energies, enthalpy, free energy, and heat capacity. These properties determine a molecule's chemical behavior and are expensive to compute from first principles.

Why QM9 matters

QM9 is where geometric deep learning proved its value for science. Computing one molecule's quantum properties via DFT takes minutes to hours of CPU time. A trained GNN predicts the same properties in milliseconds. On QM9's 130K molecules, this speedup is nice. On the billions of possible drug candidates that pharmaceutical companies screen, it is transformative: weeks of computation become seconds.

QM9 also demonstrated that 3D geometry matters. Standard GNN layers (GCN, GIN) process only the 2D molecular graph (which atoms are bonded to which). 3D-aware models (SchNet, DimeNet, EGNN) also use inter-atomic distances and angles, achieving dramatically better results. This insight -- that spatial structure contains critical information -- has influenced GNN design far beyond chemistry.

Loading QM9 in PyG

load_qm9.py
from torch_geometric.datasets import QM9

dataset = QM9(root='/tmp/QM9')
print(f"Molecules: {len(dataset)}")       # 130831

mol = dataset[0]
print(f"Atoms: {mol.num_nodes}")          # varies per molecule
print(f"Bonds: {mol.num_edges}")          # varies per molecule
print(f"Features: {mol.x.shape[1]}")     # 11
print(f"3D coords: {mol.pos.shape}")     # [N, 3]
print(f"Targets: {mol.y.shape}")         # [1, 19]

mol.pos gives 3D coordinates. mol.y has all 19 targets. Select one target for training.

Common tasks and benchmarks

Per-target regression evaluated by mean absolute error (MAE). Each of the 19 targets is typically trained and evaluated independently. The standard split uses 110K/10K/10K for train/val/test. 2D models (GCN, GIN) achieve moderate MAE. 3D models (SchNet, DimeNet++, PaiNN, EGNN) achieve chemical accuracy on many targets, meaning their predictions are within the uncertainty of experimental measurements. The gap between 2D and 3D models is often 2-5x in MAE.

Example: drug candidate screening

A pharmaceutical company needs to screen 10 million candidate molecules for drug-likeness. Key properties include solubility (related to free energy), reactivity (related to HOMO-LUMO gap), and stability (related to heat of formation). Running DFT on 10 million molecules would take thousands of CPU-years. A GNN trained on QM9-scale data predicts these properties in hours, enabling rapid filtering of candidates before expensive wet-lab testing.

Published benchmark results

Selected state-of-the-art results on QM9 for the HOMO-LUMO gap target (target index 4), reported as mean absolute error (MAE) in meV. Lower is better.

MethodMAE (meV)YearPaper
SchNet632017Schutt et al.
DimeNet322020Gasteiger et al.
DimeNet++24.62020Gasteiger et al.
PaiNN45.72021Schutt et al.
EGNN482021Satorras et al.
SphereNet23.62022Liu et al.
Equiformer~222023Liao & Smidt

Original Paper

Quantum-Chemical Insights from Deep Tensor Neural Networks

K. T. Schutt, F. Arbabzadah, S. Chmiela, K. R. Muller, A. Tkatchenko (2017). Nature Communications

Read paper →

Original data source

The QM9 dataset was originally compiled by Ramakrishnan et al. (2014) and is available for download from Figshare. The original molecules are from the GDB-17 chemical universe database.

cite_qm9.bib
@article{ramakrishnan2014quantum,
  title={Quantum chemistry structures and properties of 134 kilo molecules},
  author={Ramakrishnan, Raghunathan and Dral, Pavlo O and Rupp, Matthias and von Lilienfeld, O Anatole},
  journal={Scientific Data},
  volume={1},
  pages={140022},
  year={2014},
  publisher={Nature Publishing Group}
}

BibTeX citation for the QM9 dataset.

Which dataset should I use?

QM9 vs ZINC: QM9 is for quantum property regression with rich 3D features (11D node features + coordinates). ZINC tests GNN expressiveness with minimal features (1 categorical). Use QM9 for molecular property prediction research; use ZINC to benchmark architecture expressiveness.

QM9 vs Peptides-func: QM9 has small molecules (~18 atoms), Peptides-func has larger peptides (~151 atoms) requiring long-range reasoning. QM9 tests local 3D geometry; Peptides-func tests global structure capture.

QM9 vs MUTAG: QM9 is 700x larger and is a regression task with 19 targets. MUTAG is a tiny (188 graphs) binary classification dataset. Use MUTAG only for hello-world tutorials.

From benchmark to production

Production molecular property prediction extends beyond QM9 in several ways: larger molecules (QM9 is limited to 9 heavy atoms; drugs typically have 20-50), additional property types (binding affinity, toxicity, metabolic stability), and integration with molecular generation (design molecules with desired properties, then predict their properties). But the core pipeline -- graph representation of molecules, GNN message passing, property prediction -- remains the same.

Frequently asked questions

What is the QM9 dataset?

QM9 is a dataset of 130,831 small organic molecules with up to 9 heavy atoms (C, N, O, F). Each molecule is a graph with 11-dimensional node features. It provides 19 quantum chemical properties (dipole moment, HOMO-LUMO gap, etc.) computed via DFT, making it a multi-target regression benchmark.

How do I load QM9 in PyTorch Geometric?

Use `from torch_geometric.datasets import QM9; dataset = QM9(root='/tmp/QM9')`. Each molecule has both 2D graph structure and 3D coordinates. The 19 regression targets are stored in `data.y` as a [19]-dimensional vector.

What is special about QM9 compared to MUTAG?

QM9 is a regression task (predict continuous quantum properties), not classification. It has 700x more molecules (130K vs 188), richer features (11D with 3D coordinates), and 19 simultaneous prediction targets. QM9 is a serious molecular benchmark; MUTAG is a toy example.

What are the 19 targets in QM9?

The 19 targets include dipole moment (mu), isotropic polarizability (alpha), HOMO energy, LUMO energy, HOMO-LUMO gap, electronic spatial extent, zero-point vibrational energy, internal energy at 0K and 298K, enthalpy, free energy, heat capacity, and several other quantum chemical properties.

What models perform best on QM9?

SchNet, DimeNet, and EGNN (equivariant GNNs) achieve the best results because they use 3D coordinates and respect physical symmetries. Standard GCN/GIN on the 2D graph structure alone perform significantly worse. The key insight: 3D geometry matters for physical property prediction.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.