Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Production8 min read

Handling Time in Graph Neural Networks

Real-world graphs evolve. Customers join, transactions happen, relationships form and dissolve. A GNN that ignores time will memorize the past instead of predicting the future.

PyTorch Geometric

TL;DR

  • 1Standard PyG graphs are static: all edges exist simultaneously. Production data has timestamps on every interaction. Ignoring them causes temporal leakage.
  • 2Temporal leakage inflates offline AUROC by 10-30 points. Your model looks amazing in evaluation and fails in production because it trained on future information.
  • 3Three approaches: static snapshots (simplest), discrete-time dynamic graphs (balanced), and continuous-time dynamic graphs (most accurate, hardest to implement).
  • 4Encode time as relative differences (seconds since last interaction) and cyclical features (hour, day-of-week). Raw Unix timestamps are meaningless to neural networks.

Why time matters in graphs

Consider a fraud detection graph. A customer makes 100 transactions over 6 months. In a static graph, all 100 transactions exist as edges simultaneously. The GNN sees the customer’s entire history, including transactions that happened after the one you are trying to classify.

This is temporal leakage: the model uses future information to predict the past. It achieves 98% AUROC in offline evaluation (because it can see the answer) and 68% in production (where the future does not exist yet).

Three temporal graph approaches

1. Static snapshots

The simplest approach: build a new graph at each prediction time, including only edges that existed before that timestamp.

static_snapshot.py
import torch
from torch_geometric.data import Data

def build_snapshot(edges_df, features_df, cutoff_time):
    """Build a graph using only edges before cutoff_time."""
    mask = edges_df["timestamp"] < cutoff_time
    filtered = edges_df[mask]

    edge_index = torch.tensor(
        [filtered["src"].values, filtered["dst"].values],
        dtype=torch.long,
    )
    x = torch.tensor(features_df.values, dtype=torch.float32)
    return Data(x=x, edge_index=edge_index)

# Build monthly snapshots for training
for month in training_months:
    snapshot = build_snapshot(edges, features, month)
    train_on_snapshot(model, snapshot)

Static snapshots are correct but wasteful: you rebuild the entire graph for each prediction timestamp. For batch predictions, this means N graph constructions.

2. Discrete-time dynamic graphs

Build a sequence of graph snapshots at fixed intervals (hourly, daily, weekly) and model the temporal dynamics between them. This captures graph evolution without the overhead of continuous time.

3. Continuous-time dynamic graphs

Store every event with its exact timestamp and use temporal encodings to represent when edges were created. This is the most expressive but requires specialized architectures (TGN, TGAT) and temporal neighbor sampling.

Encoding time as features

Raw timestamps (Unix epoch seconds) are meaningless to neural networks. Convert them into learnable representations:

time_encoding.py
import torch
import numpy as np

def encode_time_features(timestamps, reference_time):
    """Convert timestamps to useful GNN features."""
    # Relative time (most important)
    dt = reference_time - timestamps  # seconds since event
    dt_hours = dt / 3600.0
    dt_days = dt / 86400.0

    # Cyclical encodings (capture periodicity)
    hour = (timestamps % 86400) / 3600.0
    day_of_week = ((timestamps // 86400) % 7).float()

    hour_sin = torch.sin(2 * np.pi * hour / 24)
    hour_cos = torch.cos(2 * np.pi * hour / 24)
    dow_sin = torch.sin(2 * np.pi * day_of_week / 7)
    dow_cos = torch.cos(2 * np.pi * day_of_week / 7)

    # Log-scaled recency (compresses long tails)
    log_recency = torch.log1p(dt_hours)

    return torch.stack([
        dt_hours, dt_days, log_recency,
        hour_sin, hour_cos, dow_sin, dow_cos,
    ], dim=-1)

Log-scaled recency is the single most predictive temporal feature in most production models. Recent events matter exponentially more than old ones.

Temporal sampling in PyG

PyG’s NeighborLoader supports temporal filtering through the time_attr parameter. This ensures that during training, each seed node only sees edges that existed before its prediction timestamp.

temporal_loader.py
from torch_geometric.loader import NeighborLoader

# Assign timestamps to edges
data.edge_time = edge_timestamps

# Each training node has a prediction time
data.node_time = prediction_timestamps

loader = NeighborLoader(
    data,
    num_neighbors=[15, 10],
    batch_size=512,
    input_nodes=train_mask,
    time_attr="edge_time",
    input_time=data.node_time[train_mask],
)

What breaks in production

  • Clock skew: Different data sources report timestamps in different time zones or with different latencies. A transaction logged 30 seconds late can leak into the wrong snapshot. Normalize all timestamps to UTC and add a safety buffer.
  • Feature staleness: Node features computed from aggregations (e.g., “average order value”) must be recomputed at each prediction timestamp. Using pre-aggregated features that include future data is a subtle form of leakage.
  • Training-serving skew: If training uses daily snapshots but serving uses real-time data, the model sees a different temporal distribution. Align training and serving temporal granularity.

Frequently asked questions

How do I add timestamps to a PyG graph?

Store timestamps as edge attributes using data.edge_time (a tensor of Unix timestamps or relative times). For node-level timestamps, use data.node_time. PyG's NeighborLoader supports a time_attr parameter that filters edges to respect temporal ordering during sampling.

What is temporal leakage in graph ML?

Temporal leakage occurs when your training graph includes edges or features from the future relative to the prediction timestamp. For example, if you're predicting whether a customer will churn next month but your graph includes their transactions from next month, the model learns from information it wouldn't have in production.

Should I use a static or dynamic graph representation?

Use static snapshots if your graph changes slowly (monthly) and you only need predictions at fixed intervals. Use a continuous-time dynamic graph if events happen at arbitrary times and you need real-time predictions. Most production systems start with static snapshots and upgrade to dynamic when latency requirements demand it.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.