A temporal split divides the dataset by time instead of randomly. Training data consists of events that occurred before a cutoff date. Test data consists of events that occurred after the cutoff. The model learns from the past and is evaluated on the future. This is the only evaluation protocol that produces realistic performance estimates for time-dependent prediction tasks.
Why random splits fail on graphs
Consider a churn prediction task with a random 80/20 split. A customer in the test set has orders spanning January to June. Some of their January orders land in the training set, some April orders in the test set. During training, the GNN aggregates messages from the customer's January orders, which share product nodes with their April orders. Future purchase patterns leak into the training representation.
The result: development AUROC is 0.88. Production AUROC is 0.74. The 14% gap is entirely due to temporal leakage through the graph structure.
How to implement temporal splits
import pandas as pd
def temporal_split(timestamps, train_ratio=0.7, val_ratio=0.1):
"""Split data by time into train/val/test."""
sorted_times = sorted(timestamps.unique())
n = len(sorted_times)
train_cutoff = sorted_times[int(n * train_ratio)]
val_cutoff = sorted_times[int(n * (train_ratio + val_ratio))]
train_mask = timestamps < train_cutoff
val_mask = (timestamps >= train_cutoff) & (timestamps < val_cutoff)
test_mask = timestamps >= val_cutoff
return train_mask, val_mask, test_mask
# For graph data, apply to BOTH nodes and edges:
# 1. Node split: which entities to predict on
node_train, node_val, node_test = temporal_split(node_timestamps)
# 2. Edge filter: which edges are visible during training
# Training: only edges before train_cutoff
# Validation: only edges before val_cutoff
# Test: only edges before test_cutoffThe split applies to both target nodes and the edges visible during GNN computation. Both must respect the temporal boundary.
Graph-specific considerations
Edge visibility
In a temporal graph split, the edge set changes per split:
- Training: Only edges with t < train_cutoff
- Validation: Only edges with t < val_cutoff
- Testing: Only edges with t < test_cutoff
This means the test set GNN computation uses a strictly larger graph than the training computation (it includes all training edges plus validation-period edges). This is correct: at test time, all past information is available.
New nodes
Some entities only appear after the cutoff (new customers, new products). These nodes have no training history. A robust model should handle both:
- Existing entities: Have rich historical context from before the cutoff
- New entities: Must be predicted from limited context (cold start)
Report metrics separately for existing and new entities. Performance on new entities is often 10-20% lower and matters for growth-stage businesses.
Common mistakes
- Splitting nodes but not edges: Temporal node split with all edges visible during training still leaks through future edges to training nodes.
- Using event time, not prediction time: The split should be on the prediction timestamp, not the event timestamp. Predicting March 1 churn should use the March 1 graph snapshot.
- Leaking validation into training: Using validation performance to select features or tune hyperparameters, then reporting test performance. The validation set must be strictly between train and test in time.
- Single split on seasonal data: If your single test set falls on Black Friday, performance looks great. Use rolling splits to average across periods.