What is in-context learning on graphs?

In-context learning (ICL) adapts a graph model to new tasks at inference time by providing a few labeled examples alongside the query, without updating any model weights. Just as GPT-4 can answer new question types when given examples in the prompt, a graph foundation model can make new types of predictions when given example input-output pairs in the graph context.

How does in-context learning differ from fine-tuning?

Fine-tuning updates model weights using gradient descent on labeled data. In-context learning does not update weights at all. Examples are provided as part of the input, and the model's attention mechanism dynamically adapts its computation to the demonstrated task. ICL is instantaneous; fine-tuning requires training time.

How does in-context learning work on graphs?

Labeled example nodes are added to the input graph with their labels visible. The model attends to these examples while processing query nodes, learning the task pattern from the examples. Graph transformers are particularly suited to ICL because their attention mechanism can dynamically route information from examples to queries.

When is in-context learning better than fine-tuning?

ICL is better when: (1) you need instant adaptation (no training time), (2) the task changes frequently (reconfigure by changing examples), (3) you have very few examples (1-5), or (4) you cannot modify the model (API-only access). Fine-tuning is better when you have more labels and need maximum accuracy.

Is in-context learning on graphs mature?

It is an emerging research area. Language model ICL is well-established, but graph ICL is newer. Early results are promising: graph transformers pre-trained on diverse tasks can adapt to new task types from examples. KumoRFM's PQL interface supports a form of ICL by accepting task descriptions that guide prediction.

In-Context Learning on Graphs: Adapting at Inference Time | Kumo.ai

In-context learning adapts a model to new tasks at inference time by providing examples in the input, without updating any model weights. You give the model 3 examples of “this customer churned” and 3 examples of “this customer stayed,” and it classifies the rest. Change the examples to fraud/not-fraud, and it classifies for fraud. No retraining, no new model, no deployment pipeline. The same model, different examples, different task.

This capability, made famous by GPT-style language models, is now emerging for graph neural networks. Graph transformers, with their global attention mechanism, are naturally suited to in-context learning because they can attend directly to the provided examples when processing query nodes.

How in-context learning works on graphs

graph_icl.py

# Conceptual in-context learning on a graph

# 1. Provide labeled examples as context
context_nodes = [
    (node_42, label="churn"),      # Example: this customer churned
    (node_87, label="churn"),      # Example: this customer churned
    (node_15, label="no_churn"),   # Example: this customer stayed
    (node_93, label="no_churn"),   # Example: this customer stayed
]

# 2. Provide query nodes
query_nodes = [node_201, node_305, node_412, ...]  # classify these

# 3. The graph transformer processes everything together
# Context nodes have label information visible to attention
# Query nodes attend to context nodes AND graph neighbors
# The model infers the task (churn prediction) from examples

predictions = graph_transformer(
    graph=customer_graph,
    context=context_nodes,
    queries=query_nodes,
)
# predictions[node_201] = "churn" (probability: 0.87)

In-context learning: provide examples, get predictions. The model infers the task from the demonstrated pattern.

Why graph transformers enable ICL

In-context learning requires the model to:

See the examples: global attention lets query nodes attend directly to context nodes, regardless of graph distance
Extract the pattern: multi-head attention identifies which features of the examples correlate with the labels
Apply the pattern: the same attention mechanism applies the extracted pattern to classify query nodes

Standard message passing GNNs cannot do this because they are local: a query node can only see its direct graph neighbors, not the context examples (unless they happen to be nearby in the graph). Graph transformers' global attention bypasses this limitation.

Enterprise example: rapid task switching

A financial institution deploys a single graph foundation model on their customer-transaction graph. Without retraining:

Monday: provide 5 confirmed fraud cases as context. Model scores all transactions for fraud likelihood.
Tuesday: provide 5 churned customers as context. Model predicts churn probability for all customers.
Wednesday: provide 5 high-LTV customers as context. Model identifies other likely high-LTV customers.

Same model, same deployment, same infrastructure. The task changes by changing the context examples. This eliminates the need to build, train, and deploy separate models for each prediction task.

Current limitations

Accuracy gap: ICL typically underperforms fine-tuning by 5-10 points when sufficient labels are available for fine-tuning
Example sensitivity: the choice of context examples significantly affects predictions. Bad examples lead to bad predictions.
Task complexity: ICL works best for classification tasks. Complex regression or generation tasks are harder to demonstrate with examples.
Maturity: graph ICL is an active research area. Production implementations are emerging but not yet as robust as language model ICL.

Key Takeaways

1In-context learning adapts to new tasks at inference time by providing labeled examples in the input. No weight updates, no training. The model dynamically adjusts via attention.
2Graph transformers enable ICL because global attention lets query nodes attend to context examples regardless of graph distance. Message passing GNNs cannot do this.
3ICL is instant and flexible: change examples to change the task. Same model predicts fraud, churn, or LTV depending on the provided context.
4The quality of ICL depends on the breadth of pre-training. Models pre-trained on diverse tasks develop stronger in-context learning abilities.
5Currently less accurate than fine-tuning (5-10 point gap) but infinitely faster to deploy. Best for rapid prototyping, task switching, and situations where fine-tuning infrastructure is unavailable.

In-Context Learning: Adapting to New Tasks at Inference Time