In-context learning adapts a model to new tasks at inference time by providing examples in the input, without updating any model weights. You give the model 3 examples of “this customer churned” and 3 examples of “this customer stayed,” and it classifies the rest. Change the examples to fraud/not-fraud, and it classifies for fraud. No retraining, no new model, no deployment pipeline. The same model, different examples, different task.
This capability, made famous by GPT-style language models, is now emerging for graph neural networks. Graph transformers, with their global attention mechanism, are naturally suited to in-context learning because they can attend directly to the provided examples when processing query nodes.
How in-context learning works on graphs
# Conceptual in-context learning on a graph
# 1. Provide labeled examples as context
context_nodes = [
(node_42, label="churn"), # Example: this customer churned
(node_87, label="churn"), # Example: this customer churned
(node_15, label="no_churn"), # Example: this customer stayed
(node_93, label="no_churn"), # Example: this customer stayed
]
# 2. Provide query nodes
query_nodes = [node_201, node_305, node_412, ...] # classify these
# 3. The graph transformer processes everything together
# Context nodes have label information visible to attention
# Query nodes attend to context nodes AND graph neighbors
# The model infers the task (churn prediction) from examples
predictions = graph_transformer(
graph=customer_graph,
context=context_nodes,
queries=query_nodes,
)
# predictions[node_201] = "churn" (probability: 0.87)In-context learning: provide examples, get predictions. The model infers the task from the demonstrated pattern.
Why graph transformers enable ICL
In-context learning requires the model to:
- See the examples: global attention lets query nodes attend directly to context nodes, regardless of graph distance
- Extract the pattern: multi-head attention identifies which features of the examples correlate with the labels
- Apply the pattern: the same attention mechanism applies the extracted pattern to classify query nodes
Standard message passing GNNs cannot do this because they are local: a query node can only see its direct graph neighbors, not the context examples (unless they happen to be nearby in the graph). Graph transformers' global attention bypasses this limitation.
Enterprise example: rapid task switching
A financial institution deploys a single graph foundation model on their customer-transaction graph. Without retraining:
- Monday: provide 5 confirmed fraud cases as context. Model scores all transactions for fraud likelihood.
- Tuesday: provide 5 churned customers as context. Model predicts churn probability for all customers.
- Wednesday: provide 5 high-LTV customers as context. Model identifies other likely high-LTV customers.
Same model, same deployment, same infrastructure. The task changes by changing the context examples. This eliminates the need to build, train, and deploy separate models for each prediction task.
Current limitations
- Accuracy gap: ICL typically underperforms fine-tuning by 5-10 points when sufficient labels are available for fine-tuning
- Example sensitivity: the choice of context examples significantly affects predictions. Bad examples lead to bad predictions.
- Task complexity: ICL works best for classification tasks. Complex regression or generation tasks are harder to demonstrate with examples.
- Maturity: graph ICL is an active research area. Production implementations are emerging but not yet as robust as language model ICL.