What is temporal sampling in graph neural networks?

Temporal sampling is a neighborhood sampling strategy where a node can only receive messages from edges that existed before the prediction timestamp. If you are predicting whether a customer will churn on March 1, the GNN can only see transactions, interactions, and relationships from before March 1.

Why is temporal sampling necessary?

Without temporal sampling, a GNN can see future information during training. An order placed after the prediction date leaks information about whether the customer churned. This inflates training metrics but the model fails in production where future information is never available.

How does temporal sampling differ from temporal splitting?

Temporal splitting divides the dataset into train/validation/test by time (e.g., train on January-March, test on April). Temporal sampling operates within training: for each training example, it restricts the GNN's neighborhood to edges before that example's timestamp. Both are necessary for correct temporal evaluation.

Temporal Sampling: Sampling Graph Neighborhoods with Time Constraints | Kumo.ai

Temporal sampling is a neighborhood sampling strategy that restricts graph neural networks to only see edges and events that occurred before the prediction timestamp. When predicting whether customer Alice will churn on March 1, the GNN can only aggregate information from Alice's transactions, interactions, and relationships that existed before March 1. Any edge created on or after March 1 is invisible.

The leakage problem

Standard GNN training on static graphs treats the graph as a fixed snapshot. Every edge is visible regardless of when it was created. For time-dependent tasks, this is catastrophically wrong:

Fraud detection: The model sees that an account was frozen (an edge to the “frozen” status node) before predicting whether the account is fraudulent. The freeze happened because of the fraud.
Churn prediction: The model sees that the customer made no purchases in the month after the prediction date. That absence is the churn.
Default prediction: The model sees collection actions that were triggered by the default, not before it.

In all cases, the model achieves artificially high accuracy during training but fails completely in production where future information does not exist yet.

How temporal sampling works

Every edge in the graph has a timestamp. During neighborhood sampling for a training example at time T:

Filter edges: Keep only edges where t_edge < T.
Sample neighbors: From the filtered edge set, sample K neighbors per node (same as standard neighbor sampling).
Apply at every hop: For a 2-layer GNN, the filter must be applied at both the 1-hop and 2-hop expansion. A 2-hop neighbor reached through a future edge is just as leaky as a 1-hop future edge.

temporal_sampling.py

import torch
from torch_geometric.data import TemporalData

def temporal_neighbor_sample(
    edge_index, edge_time, target_nodes, target_time,
    num_neighbors=10, num_hops=2
):
    """Sample neighbors respecting temporal constraints."""
    sampled_nodes = target_nodes
    for hop in range(num_hops):
        # Find all edges TO current frontier nodes
        mask = torch.isin(edge_index[1], sampled_nodes)
        candidate_edges = edge_index[:, mask]
        candidate_times = edge_time[mask]

        # Filter: only edges BEFORE target time
        time_mask = candidate_times < target_time
        valid_edges = candidate_edges[:, time_mask]

        # Sample up to K neighbors per node
        # (simplified; production uses efficient CSR sampling)
        new_neighbors = valid_edges[0].unique()[:num_neighbors]
        sampled_nodes = torch.cat([sampled_nodes, new_neighbors])

    return sampled_nodes.unique()

The key line: candidate_times < target_time. This single filter prevents all temporal leakage in neighborhood sampling.

Temporal sampling vs temporal splitting

Both are necessary but serve different purposes:

Temporal split: Divides the dataset by time. Training on January-March, validation on April, test on May. This ensures the test set evaluates future generalization.
Temporal sampling: Within the training set, ensures each example's GNN computation only uses edges before that example's timestamp. Even training examples from January should not see February edges.

Using temporal splits without temporal sampling still leaks information. A January training example might aggregate information from a March edge within the training set, learning a pattern that is unavailable at prediction time.

Performance considerations

Temporal sampling is 2-5x slower than static sampling because:

Each example requires a unique edge filter (no shared precomputation)
The filter operation itself adds overhead per hop
Batch construction is more complex (different subgraphs per example)

Optimizations include: sorting edges by time for binary search filtering, caching subgraph snapshots at fixed time intervals, and using temporal CSR (Compressed Sparse Row) data structures that enable O(log n) time filtering per node.

Common mistakes

Filtering only at hop 1: Future information can still reach the target node through a valid 1-hop neighbor that itself received a future 2-hop message. Filter at every hop.
Using node features from the future: If node features are time-varying (e.g., account balance), use the feature values from before T, not the latest values.
Ignoring edge creation time: Structural edges (customer → account) may seem timeless, but they were created at account opening. A customer who opened an account after T should not be visible.

Temporal Sampling: Neighborhood Sampling with Time Constraints

The leakage problem

How temporal sampling works

Temporal sampling vs temporal splitting

Performance considerations

Common mistakes

Frequently asked questions

What is temporal sampling in graph neural networks?

Why is temporal sampling necessary?

How does temporal sampling differ from temporal splitting?

Related

From the Kumo Learn Hub

Learn more about graph ML