Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide7 min read

In-Context Learning: Adapting to New Tasks at Inference Time

In-context learning adapts a graph foundation model to new tasks by providing labeled examples alongside the query. No weight updates, no retraining, no deployment pipeline. Just examples in, predictions out.

PyTorch Geometric

TL;DR

  • 1In-context learning (ICL) adapts to new tasks at inference time by providing labeled examples in the input. No weight updates, no training loop. The model dynamically adjusts its computation.
  • 2Graph transformers enable ICL through their attention mechanism: query nodes attend to labeled example nodes, extracting the task pattern from the demonstrated input-output mapping.
  • 3ICL vs fine-tuning: ICL is instant (no training), flexible (change task by changing examples), and works with very few examples (1-5). Fine-tuning gives higher accuracy with more labels.
  • 4Emerging research area for graphs. Language model ICL is mature; graph ICL is newer but shows promise, especially with graph transformers pre-trained on diverse tasks.
  • 5Enterprise value: instant task switching without retraining. A deployed model can predict churn today, fraud tomorrow, and LTV next week, just by providing different examples.

In-context learning adapts a model to new tasks at inference time by providing examples in the input, without updating any model weights. You give the model 3 examples of “this customer churned” and 3 examples of “this customer stayed,” and it classifies the rest. Change the examples to fraud/not-fraud, and it classifies for fraud. No retraining, no new model, no deployment pipeline. The same model, different examples, different task.

This capability, made famous by GPT-style language models, is now emerging for graph neural networks. Graph transformers, with their global attention mechanism, are naturally suited to in-context learning because they can attend directly to the provided examples when processing query nodes.

How in-context learning works on graphs

graph_icl.py
# Conceptual in-context learning on a graph

# 1. Provide labeled examples as context
context_nodes = [
    (node_42, label="churn"),      # Example: this customer churned
    (node_87, label="churn"),      # Example: this customer churned
    (node_15, label="no_churn"),   # Example: this customer stayed
    (node_93, label="no_churn"),   # Example: this customer stayed
]

# 2. Provide query nodes
query_nodes = [node_201, node_305, node_412, ...]  # classify these

# 3. The graph transformer processes everything together
# Context nodes have label information visible to attention
# Query nodes attend to context nodes AND graph neighbors
# The model infers the task (churn prediction) from examples

predictions = graph_transformer(
    graph=customer_graph,
    context=context_nodes,
    queries=query_nodes,
)
# predictions[node_201] = "churn" (probability: 0.87)

In-context learning: provide examples, get predictions. The model infers the task from the demonstrated pattern.

Why graph transformers enable ICL

In-context learning requires the model to:

  1. See the examples: global attention lets query nodes attend directly to context nodes, regardless of graph distance
  2. Extract the pattern: multi-head attention identifies which features of the examples correlate with the labels
  3. Apply the pattern: the same attention mechanism applies the extracted pattern to classify query nodes

Standard message passing GNNs cannot do this because they are local: a query node can only see its direct graph neighbors, not the context examples (unless they happen to be nearby in the graph). Graph transformers' global attention bypasses this limitation.

Enterprise example: rapid task switching

A financial institution deploys a single graph foundation model on their customer-transaction graph. Without retraining:

  • Monday: provide 5 confirmed fraud cases as context. Model scores all transactions for fraud likelihood.
  • Tuesday: provide 5 churned customers as context. Model predicts churn probability for all customers.
  • Wednesday: provide 5 high-LTV customers as context. Model identifies other likely high-LTV customers.

Same model, same deployment, same infrastructure. The task changes by changing the context examples. This eliminates the need to build, train, and deploy separate models for each prediction task.

Current limitations

  • Accuracy gap: ICL typically underperforms fine-tuning by 5-10 points when sufficient labels are available for fine-tuning
  • Example sensitivity: the choice of context examples significantly affects predictions. Bad examples lead to bad predictions.
  • Task complexity: ICL works best for classification tasks. Complex regression or generation tasks are harder to demonstrate with examples.
  • Maturity: graph ICL is an active research area. Production implementations are emerging but not yet as robust as language model ICL.

Frequently asked questions

What is in-context learning on graphs?

In-context learning (ICL) adapts a graph model to new tasks at inference time by providing a few labeled examples alongside the query, without updating any model weights. Just as GPT-4 can answer new question types when given examples in the prompt, a graph foundation model can make new types of predictions when given example input-output pairs in the graph context.

How does in-context learning differ from fine-tuning?

Fine-tuning updates model weights using gradient descent on labeled data. In-context learning does not update weights at all. Examples are provided as part of the input, and the model's attention mechanism dynamically adapts its computation to the demonstrated task. ICL is instantaneous; fine-tuning requires training time.

How does in-context learning work on graphs?

Labeled example nodes are added to the input graph with their labels visible. The model attends to these examples while processing query nodes, learning the task pattern from the examples. Graph transformers are particularly suited to ICL because their attention mechanism can dynamically route information from examples to queries.

When is in-context learning better than fine-tuning?

ICL is better when: (1) you need instant adaptation (no training time), (2) the task changes frequently (reconfigure by changing examples), (3) you have very few examples (1-5), or (4) you cannot modify the model (API-only access). Fine-tuning is better when you have more labels and need maximum accuracy.

Is in-context learning on graphs mature?

It is an emerging research area. Language model ICL is well-established, but graph ICL is newer. Early results are promising: graph transformers pre-trained on diverse tasks can adapt to new task types from examples. KumoRFM's PQL interface supports a form of ICL by accepting task descriptions that guide prediction.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.