Why are road networks naturally graphs?

A road network is a graph where intersections (or road segments) are nodes and roads connecting them are edges. Sensor data (speed, flow, occupancy) attaches as node features that change over time. Traffic propagates along this graph: congestion at one intersection affects downstream intersections through the graph structure.

What is a spatio-temporal graph neural network?

A spatio-temporal GNN combines graph convolutions (for spatial dependencies: how congestion spreads across connected roads) with temporal convolutions or recurrence (for temporal dependencies: how traffic evolves over time at each location). The most common architecture alternates graph convolution layers with temporal convolution layers (1D convolutions or GRU/LSTM over the time dimension).

How does traffic prediction with GNNs compare to ARIMA or LSTM?

ARIMA and LSTM treat each sensor as an independent time series. They can learn temporal patterns (rush hour, weekday/weekend) but cannot learn spatial propagation (congestion on Highway A causes slowdowns on connected Highway B 15 minutes later). GNN-based models (DCRNN, STGCN, Graph WaveNet) consistently outperform univariate time-series methods by 15-25% on MAE for 30-60 minute forecasting horizons.

What datasets are used for traffic prediction with GNNs?

The standard benchmarks are METR-LA (207 sensors on Los Angeles highways, 4 months of data) and PEMS-BAY (325 sensors in the San Francisco Bay Area, 6 months). Both provide speed readings at 5-minute intervals. The graph is constructed from the road network, with edges weighted by road distance between sensors.

Traffic Prediction with Graph Neural Networks: Forecasting Congestion | Kumo.ai

Traffic congestion propagates through road networks as a spatial phenomenon, and graph neural networks are the natural architecture for modeling this propagation. A slowdown on a highway entrance ramp causes congestion that spreads upstream over minutes. An accident on a major artery diverts traffic to parallel routes, causing secondary congestion. These are graph-structured patterns: the congestion signal travels along edges (roads) between nodes (intersections or sensors).

Traditional traffic forecasting treats each sensor as an independent time series. An LSTM at sensor A learns rush hour patterns for sensor A. But it cannot learn that congestion at sensor B (3 miles upstream) predicts congestion at sensor A in 12 minutes. GNNs capture this spatial dependency through message passing on the road graph.

The road network as a graph

Building a traffic graph requires two components:

Nodes: each traffic sensor or road segment. Node features are time-varying: speed, flow (vehicles per hour), and occupancy (fraction of time the sensor is occupied).
Edges: road connections between sensors. Edge weights are typically the inverse of road distance (closer sensors have stronger connections). Some models use directed edges to capture one-way streets and directional traffic flow.

On METR-LA (207 sensors across Los Angeles highways), the graph has 207 nodes and approximately 1,500 edges. Each node carries a feature vector that updates every 5 minutes. The prediction task: given the past 60 minutes of sensor readings across the network, predict the next 15-60 minutes at all sensors simultaneously.

Spatio-temporal architecture

The defining innovation in traffic GNNs is combining spatial graph operations with temporal sequence operations:

Spatial component: graph convolution

At each time step, a graph convolution propagates traffic information across the road network. After one layer, each sensor's embedding includes information from directly connected sensors. After two layers, it includes 2-hop information. This captures the spatial propagation of congestion.

Temporal component: sequence modeling

Across time steps, a temporal model (1D convolution, GRU, or dilated causal convolution) captures patterns like rush hour cycles, weekend effects, and holiday impacts. The temporal component operates on the spatially-enriched embeddings from the graph convolution.

Foundational models

Three architectures established the field:

DCRNN (Diffusion Convolutional Recurrent Neural Network): models traffic as a diffusion process on the graph. Uses diffusion convolution for spatial and GRU for temporal. Introduced the encoder-decoder framework for multi-step forecasting.
STGCN (Spatio-Temporal Graph Convolutional Network): uses spectral graph convolutions with gated 1D temporal convolutions. Faster than DCRNN due to non-recurrent temporal processing.
Graph WaveNet: combines adaptive graph learning (the model learns the graph structure rather than relying solely on road connectivity) with dilated causal convolutions for long-range temporal dependencies.

Why spatial modeling matters: a concrete example

Consider a 3-sensor stretch of highway: sensor B is between sensors A (upstream) and C (downstream). At 5:05 PM, sensor A detects a speed drop to 20 mph. An LSTM at sensor B does not see this. A GNN does:

At 5:05 PM, sensor A's speed feature drops. Graph convolution propagates this to sensor B's embedding.
The temporal component recognizes this pattern from training: upstream slowdown predicts local slowdown with a characteristic delay.
At 5:10 PM, the model predicts sensor B will drop to 25 mph, and sensor C will drop to 30 mph at 5:15 PM.

The LSTM at sensor B would not predict the slowdown until sensor B itself starts slowing down. The GNN predicts it 5-10 minutes earlier by learning the spatial propagation.

Performance on benchmarks

On METR-LA (15-minute ahead prediction), representative results:

Historical Average: 7.80 MAE (mph)
ARIMA: 5.55 MAE
LSTM: 4.19 MAE
DCRNN: 3.17 MAE
Graph WaveNet: 2.99 MAE

The gap widens at longer horizons (60 minutes), where spatial propagation patterns become even more important. GNN models maintain accuracy while univariate models degrade rapidly.

Beyond road traffic

The same spatio-temporal graph architecture applies to any flow on a network:

Public transit: predict passenger flow at stations using the transit network graph
Ride-sharing demand: predict ride requests across city zones connected by travel patterns
Air traffic: predict flight delays propagating through the airport network
Logistics: predict delivery times across warehouse and route networks

Key Takeaways

1Traffic congestion is a spatial propagation phenomenon. Treating each sensor independently (ARIMA, LSTM) misses the key signal: upstream congestion predicts downstream congestion with a time delay.
2Spatio-temporal GNNs alternate graph convolution (spatial propagation) with temporal sequence modeling (rush hour patterns, weekly cycles). This captures the interaction: congestion at B at time t predicts congestion at A at t+12min.
3GNN-based models (DCRNN, STGCN, Graph WaveNet) outperform univariate time-series models by 15-25% MAE, with the gap widening at longer forecast horizons where spatial propagation dominates.
4The road graph is built from sensor connectivity with distance-weighted edges. Some models (Graph WaveNet) also learn adaptive graph structure from data, discovering latent spatial dependencies.
5The spatio-temporal graph framework generalizes beyond roads to any flow-on-network problem: transit, ride-sharing, air traffic, logistics.

Traffic Prediction with Graph Neural Networks: Forecasting Congestion