Temporal sampling is a neighborhood sampling strategy that restricts graph neural networks to only see edges and events that occurred before the prediction timestamp. When predicting whether customer Alice will churn on March 1, the GNN can only aggregate information from Alice's transactions, interactions, and relationships that existed before March 1. Any edge created on or after March 1 is invisible.
The leakage problem
Standard GNN training on static graphs treats the graph as a fixed snapshot. Every edge is visible regardless of when it was created. For time-dependent tasks, this is catastrophically wrong:
- Fraud detection: The model sees that an account was frozen (an edge to the “frozen” status node) before predicting whether the account is fraudulent. The freeze happened because of the fraud.
- Churn prediction: The model sees that the customer made no purchases in the month after the prediction date. That absence is the churn.
- Default prediction: The model sees collection actions that were triggered by the default, not before it.
In all cases, the model achieves artificially high accuracy during training but fails completely in production where future information does not exist yet.
How temporal sampling works
Every edge in the graph has a timestamp. During neighborhood sampling for a training example at time T:
- Filter edges: Keep only edges where t_edge < T.
- Sample neighbors: From the filtered edge set, sample K neighbors per node (same as standard neighbor sampling).
- Apply at every hop: For a 2-layer GNN, the filter must be applied at both the 1-hop and 2-hop expansion. A 2-hop neighbor reached through a future edge is just as leaky as a 1-hop future edge.
import torch
from torch_geometric.data import TemporalData
def temporal_neighbor_sample(
edge_index, edge_time, target_nodes, target_time,
num_neighbors=10, num_hops=2
):
"""Sample neighbors respecting temporal constraints."""
sampled_nodes = target_nodes
for hop in range(num_hops):
# Find all edges TO current frontier nodes
mask = torch.isin(edge_index[1], sampled_nodes)
candidate_edges = edge_index[:, mask]
candidate_times = edge_time[mask]
# Filter: only edges BEFORE target time
time_mask = candidate_times < target_time
valid_edges = candidate_edges[:, time_mask]
# Sample up to K neighbors per node
# (simplified; production uses efficient CSR sampling)
new_neighbors = valid_edges[0].unique()[:num_neighbors]
sampled_nodes = torch.cat([sampled_nodes, new_neighbors])
return sampled_nodes.unique()The key line: candidate_times < target_time. This single filter prevents all temporal leakage in neighborhood sampling.
Temporal sampling vs temporal splitting
Both are necessary but serve different purposes:
- Temporal split: Divides the dataset by time. Training on January-March, validation on April, test on May. This ensures the test set evaluates future generalization.
- Temporal sampling: Within the training set, ensures each example's GNN computation only uses edges before that example's timestamp. Even training examples from January should not see February edges.
Using temporal splits without temporal sampling still leaks information. A January training example might aggregate information from a March edge within the training set, learning a pattern that is unavailable at prediction time.
Performance considerations
Temporal sampling is 2-5x slower than static sampling because:
- Each example requires a unique edge filter (no shared precomputation)
- The filter operation itself adds overhead per hop
- Batch construction is more complex (different subgraphs per example)
Optimizations include: sorting edges by time for binary search filtering, caching subgraph snapshots at fixed time intervals, and using temporal CSR (Compressed Sparse Row) data structures that enable O(log n) time filtering per node.
Common mistakes
- Filtering only at hop 1: Future information can still reach the target node through a valid 1-hop neighbor that itself received a future 2-hop message. Filter at every hop.
- Using node features from the future: If node features are time-varying (e.g., account balance), use the feature values from before T, not the latest values.
- Ignoring edge creation time: Structural edges (customer → account) may seem timeless, but they were created at account opening. A customer who opened an account after T should not be visible.