What is in-context learning for structured data?

In-context learning (ICL) for structured data means a pre-trained model can make predictions on a new dataset without any training or fine-tuning. You pass the dataset as context, and the model recognizes patterns from its pre-training to generate predictions in a single forward pass. No gradient updates, no training loop, no hyperparameter tuning. It is the same concept as how GPT can answer questions about a text passage it has never seen, but applied to tables and relational databases instead of text.

How does in-context learning differ from traditional machine learning on tabular data?

Traditional ML on tabular data requires a full pipeline for each new dataset: feature engineering, model selection, hyperparameter tuning, training, and evaluation. This takes 6-12 weeks per prediction task and requires data science expertise. In-context learning skips the entire pipeline. A pre-trained model takes your dataset as input and returns predictions in seconds. The tradeoff: ICL models may sacrifice a few accuracy points compared to a fully tuned traditional model, but the time savings are orders of magnitude. And for relational data, KumoRFM with ICL actually outperforms traditional approaches because it captures cross-table patterns that manual feature engineering typically misses.

What is TabPFN and how does it work?

TabPFN (Tabular Prior-Data Fitted Network) is a model from the University of Freiburg that performs in-context learning on single flat tables. It was trained on millions of synthetic classification datasets and can make predictions on new tabular datasets without training. You feed it the table (features and labels), and it returns predictions in a single forward pass. TabPFN works well on small to medium single-table datasets (up to around 10,000 rows and 100 features). It does not handle relational data (multiple connected tables), large datasets, or regression tasks natively.

What is KumoRFM and how does it do in-context learning?

KumoRFM is a relational foundation model built by Kumo.ai. It is pre-trained on tens of thousands of real-world relational databases (multiple connected tables, not just single flat tables). For in-context learning, you point KumoRFM at a new relational database, write a PQL query describing what to predict, and the model returns scored predictions without any training or fine-tuning. KumoRFM is the only ICL model that works on relational data, meaning it reads the connections between tables (customers linked to orders linked to products linked to reviews) and discovers cross-table patterns automatically.

What is NICL (Neuralk) and how does it compare to TabPFN?

NICL (from Neuralk) is an in-context learning model focused on single-table classification tasks for commerce and marketing use cases. Like TabPFN, it operates on flat tables and does not handle relational data. NICL differentiates by being optimized for commerce-specific prediction patterns (purchase propensity, customer segmentation) rather than general tabular classification. It is a more specialized tool compared to TabPFN's general-purpose approach.

Why is in-context learning on relational data harder than on flat tables?

A single flat table has a fixed schema: N rows and M columns. You can represent the entire dataset as a matrix and feed it to a transformer. Relational data is structurally different. You have multiple tables with different schemas, connected by foreign keys, with variable-length relationships (one customer has 3 orders, another has 300). The model must understand table schemas, join relationships, cardinality, temporal ordering across tables, and multi-hop patterns. This is why TabPFN works on flat tables but not relational databases. KumoRFM solves this by representing relational data as a heterogeneous graph and using graph neural network architectures that can handle variable-structure relational inputs.

Can in-context learning replace traditional ML for enterprise predictions?

For many enterprise prediction tasks, yes. KumoRFM zero-shot (pure in-context learning, no fine-tuning) achieves 76.71 AUROC on the RelBench benchmark across 30 prediction tasks, outperforming LightGBM with manual feature engineering at 62.44 AUROC. Fine-tuning KumoRFM pushes this to 81.14. For tasks where you need maximum accuracy and have the time to invest, fine-tuning a foundation model still wins. But for speed to first prediction, rapid prototyping, or use cases where you cannot wait 6-12 weeks for a traditional pipeline, ICL is already better than the traditional alternative.

What are the limitations of in-context learning for structured data?

Current limitations include: (1) Context window size: TabPFN is limited to roughly 10,000 rows and 100 features per prediction. KumoRFM handles much larger relational databases but still has practical limits on graph size. (2) Accuracy ceiling: for any single dataset, a carefully tuned traditional model may achieve 2-5% higher accuracy than zero-shot ICL. Fine-tuning closes most of this gap. (3) Domain specificity: ICL models trained primarily on structured data may not capture highly domain-specific patterns (like medical time series or sensor data) as well as specialized models. (4) Explainability: understanding why an ICL model made a specific prediction is harder than interpreting a decision tree or SHAP values on XGBoost.

In-Context Learning for Structured Data: How Foundation Models Predict Without Training | Kumo.ai

If you have used ChatGPT or Claude, you have already experienced in-context learning. You paste a passage of text the model has never seen, ask a question about it, and get a correct answer instantly. The model did not train on that passage. It did not fine-tune. It recognized patterns from its pre-training and applied them to new content in a single forward pass.

Now apply that same idea to a database table. You give a pre-trained model a dataset it has never seen (say, your customer churn data), and it returns predictions without any training. No feature engineering. No hyperparameter tuning. No waiting for a training loop to converge. Just predictions, in seconds.

That is in-context learning for structured data. It is real, it works today, and it changes the economics of enterprise ML.

How in-context learning works for structured data

The mechanics are straightforward, even if the engineering behind them is not. An ICL model for structured data is pre-trained on a large corpus of datasets (thousands to millions of them). During pre-training, the model learns general patterns about how structured data behaves: what features predict outcomes, how tables relate to each other, what temporal patterns look like across different domains.

When you give it a new dataset at inference time, the model does not update its weights. It processes your data in a single forward pass, matches patterns from pre-training, and outputs predictions. This is different from traditional ML in a critical way: every new dataset requires a fresh training loop with gradient updates.

Pre-training phase (done once): The model trains on thousands or millions of diverse datasets. For TabPFN, these are synthetic tabular datasets. For KumoRFM, these are real-world relational databases with multiple connected tables. The model learns general prediction patterns that transfer across domains.
Inference phase (per new dataset): You pass your new dataset as input. The model recognizes which pre-trained patterns apply and generates predictions. No gradient updates. No training loop. One forward pass.
Optional fine-tuning: For maximum accuracy on a specific dataset, you can fine-tune the model with a small number of gradient updates. This is faster than training from scratch and typically improves accuracy by 3-5 AUROC points over zero-shot.

The three models that do ICL for structured data

As of 2026, three models can perform genuine in-context learning on structured data. They differ significantly in what kind of structured data they handle and how they were built.

icl_models_structured_data_comparison

model	developer	data_type	pre-training_data	max_input_size	handles_relational_data
TabPFN	University of Freiburg	Single flat tables	Millions of synthetic tabular datasets	~10,000 rows, ~100 features	No (single table only)
KumoRFM	Kumo.ai	Relational databases (multiple connected tables)	Tens of thousands of real-world relational databases	Enterprise-scale relational graphs	Yes (multiple tables with foreign keys)
NICL (Neuralk)	Neuralk	Single flat tables	Commerce and marketing datasets	Moderate (single table)	No (single table only)

Three ICL models for structured data. TabPFN and NICL handle single flat tables. KumoRFM is the only model that handles relational databases with multiple connected tables.

TabPFN: in-context learning on flat tables

TabPFN was the first model to demonstrate that in-context learning works for tabular data. Developed at the University of Freiburg, it was trained on millions of synthetic classification datasets generated by sampling from a prior over data-generating processes.

The result: you feed TabPFN a new flat table (features and labels for training rows, features only for test rows), and it returns class predictions in a single forward pass. No hyperparameter tuning. No model selection. On small to medium datasets (under 10,000 rows), TabPFN matches or beats tuned XGBoost and random forest. That is a genuine achievement.

The limitations are clear: TabPFN works on single flat tables only. Enterprise data does not live in single flat tables. If your churn prediction requires joining customer, order, support ticket, and usage tables, you must flatten them yourself before TabPFN can use them. That flattening step is exactly the feature engineering bottleneck that ICL was supposed to eliminate.

KumoRFM: in-context learning on relational data

KumoRFM extends in-context learning to relational databases. Instead of requiring a pre-flattened table, it takes multiple connected tables as input, understands their schema and foreign key relationships, and makes predictions that incorporate cross-table patterns.

This is a harder problem than flat-table ICL by a significant margin. A flat table is a fixed-size matrix. A relational database is a variable-structure graph: different numbers of tables, different schemas per table, different cardinalities in relationships (one customer has 3 orders, another has 300), temporal ordering across tables. KumoRFM handles this by representing relational data as a heterogeneous graph and using graph neural network architectures designed for variable-structure inputs.

The practical impact: you point KumoRFM at your Snowflake or data warehouse, write a PQL query like PREDICT churned_30d FOR EACH customers.customer_id, and get predictions in seconds. No joins. No feature engineering. No training. The model reads your relational structure directly.

NICL (Neuralk): in-context learning for commerce

NICL takes a more specialized approach. Rather than targeting general tabular or relational data, it focuses on commerce and marketing prediction tasks: purchase propensity, customer segmentation, campaign response prediction. It operates on single flat tables, similar to TabPFN, but is optimized for the feature distributions and patterns common in commerce data.

The tradeoff is scope. NICL may outperform TabPFN on commerce tasks specifically, but it does not generalize to arbitrary tabular problems or handle relational data.

Why relational data is the hard frontier

The jump from flat-table ICL to relational ICL is not incremental. It is a structurally different problem. Here is why:

Variable structure. A flat table has a fixed schema: N rows by M columns. Every dataset has the same shape (a matrix). A relational database has a variable number of tables, each with different schemas, connected by different foreign key patterns. The model must handle any relational structure, not just matrices.
Variable cardinality. In a relational database, one customer might have 3 orders and another might have 3,000. One product might have 10 reviews and another might have 10,000. The model must aggregate variable-length relationships without losing signal.
Multi-hop patterns. The most predictive patterns in relational data often span multiple tables. A customer who churns might show declining order frequency (orders table), increasing support tickets (tickets table), and decreasing product diversity (order items table). These cross-table signals require the model to reason across 3-4+ table hops.
Temporal ordering. Relational data has timestamps scattered across multiple tables. The model must respect temporal causality: a churn prediction at time T can only use data from before time T, even when that data is spread across 5 different tables with different temporal granularities.

This is why TabPFN and NICL stop at flat tables. Handling relational structure requires a different architecture (graph neural networks vs transformers on matrices) and a different pre-training strategy (real relational databases vs synthetic tables).

flat_table_icl_vs_relational_icl

dimension	flat-table ICL (TabPFN, NICL)	relational ICL (KumoRFM)
Input format	Single table (N rows x M columns)	Multiple connected tables with foreign keys
Pre-training data	Synthetic tables (TabPFN) or domain-specific tables (NICL)	Tens of thousands of real-world relational databases
Feature engineering required	Must flatten relational data into one table first	None. Reads relational structure directly.
Cross-table patterns	Cannot discover (only sees one table)	Discovers automatically across all connected tables
Max input size	~10K rows, ~100 features (TabPFN)	Enterprise-scale relational databases
Enterprise readiness	Research stage. Works on small datasets.	Production. Deployed at Fortune 500 companies.
Task types	Classification (TabPFN). Commerce classification (NICL).	Classification, regression, ranking, recommendation across any relational domain.

Flat-table ICL handles the simple case well. Relational ICL handles the case that actually matters for enterprises, where data lives across multiple connected tables.

The benchmark evidence

In-context learning is not just faster. On relational data, it is actually more accurate than traditional approaches. This is counterintuitive until you understand why: manual feature engineering on relational data typically captures only a fraction of the available cross-table signal. A foundation model pre-trained on thousands of relational databases discovers patterns that human engineers miss.

relbench_icl_benchmark

approach	AUROC	time_to_prediction	feature_engineering
LightGBM + manual features	62.44	12.3 hours per task	878 lines of code per task
KumoRFM zero-shot (ICL)	76.71	~1 second	None
KumoRFM fine-tuned	81.14	Minutes	None

RelBench benchmark across 7 databases and 30 prediction tasks. KumoRFM zero-shot (pure ICL, no fine-tuning) outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning adds another 4.4 points.

On the SAP SALT enterprise benchmark:

sap_salt_icl_benchmark

approach	accuracy	uses_ICL
LLM + AutoML	63%	No (trains from scratch)
PhD Data Scientist + XGBoost	75%	No (trains from scratch with manual features)
KumoRFM (zero-shot)	91%	Yes (in-context learning on relational data)

SAP SALT benchmark. KumoRFM's in-context learning on relational data outperforms both traditional ML and LLM-assisted approaches by wide margins.

Why this matters for enterprise ML teams

The traditional enterprise ML pipeline looks like this: a business team identifies a prediction need (churn, fraud, demand forecast). A data science team spends 2-4 weeks on feature engineering. They spend another 2-4 weeks on model training, tuning, and validation. Deployment takes another 2-4 weeks. Total: 6-12 weeks from request to production prediction.

With in-context learning on relational data, that timeline collapses. You write a PQL query. The model returns predictions in seconds. If you want to fine-tune for maximum accuracy, that takes minutes to hours, not weeks. A prediction that used to require a quarter of data science time now takes an afternoon.

This is not just a speed improvement. It changes what is economically viable. Use cases that were too small to justify a 6-week pipeline (predicting churn for a specific product line, scoring fraud risk for a new market, forecasting demand for a seasonal category) become feasible when the cost drops from weeks to seconds.

Traditional ML pipeline (per prediction task)

Identify prediction target and relevant tables (1-2 weeks)
Join tables, engineer features, handle temporal windowing (2-4 weeks, 878 lines of code)
Select model, tune hyperparameters, train (1-2 weeks)
Validate, test, deploy to production (2-4 weeks)
Total: 6-12 weeks, 3-4 data scientists
Repeat from scratch for every new prediction task

In-context learning with KumoRFM

Connect to data warehouse (one-time setup)
Write PQL: PREDICT churned_30d FOR EACH customers.customer_id
Model reads relational tables, returns predictions in seconds
Optional: fine-tune for maximum accuracy (minutes to hours)
Total: minutes to hours, 1 ML engineer or analyst
New prediction tasks take the same amount of time

PQL Query

PREDICT churned_30d
FOR EACH customers.customer_id

One PQL query triggers in-context learning on your full relational database. KumoRFM reads customers, orders, support tickets, usage logs, and any other connected tables. It discovers cross-table churn patterns and returns predictions without any training, feature engineering, or pipeline code.

Output

customer_id	churn_probability	key_signals
CUST-2201	0.89	Order frequency down 70% (orders table), 4 support tickets in 14 days (tickets table), usage down 55% (usage table)
CUST-2202	0.14	Stable order cadence, recent product expansion, no support escalations
CUST-2203	0.76	Payment failures (billing table), reduced login frequency (sessions table), competitor product in tech stack (integrations table)
CUST-2204	0.08	Increasing usage, active API calls, recent feature adoption

When to use ICL vs fine-tuning vs traditional ML

In-context learning is not always the right answer. Here is a practical decision framework:

when_to_use_icl_vs_finetuning_vs_traditional

scenario	best_approach	why
Need predictions fast (hours, not weeks)	ICL (KumoRFM zero-shot)	Predictions in seconds. No pipeline to build.
Exploring whether a prediction task is feasible	ICL (KumoRFM zero-shot or TabPFN)	Get a baseline prediction in minutes to validate the use case before investing in a full pipeline.
Production use case, maximum accuracy needed	Fine-tuned foundation model (KumoRFM fine-tuned)	Fine-tuning adds 3-5 AUROC points over zero-shot. Still faster than traditional ML.
Small, flat dataset with clean labels	TabPFN or tuned XGBoost	Both work well on small flat tables. TabPFN is faster. XGBoost may be slightly more accurate with tuning.
Relational data (multiple connected tables)	KumoRFM (zero-shot or fine-tuned)	Only option that reads relational structure natively. Flat-table models require manual flattening that loses signal.
Highly regulated domain requiring full explainability	Traditional ML (XGBoost + SHAP)	ICL models have improving but still limited explainability. If regulators require feature-level explanations, traditional models have an edge.

Decision framework for ICL vs alternatives. ICL wins on speed and relational data. Traditional ML wins when you have a simple flat dataset and weeks to tune.

Key Takeaways

1In-context learning for structured data means a pre-trained model makes predictions on new datasets without any training. You pass the data as context, and it returns predictions in a single forward pass. Same concept as how GPT handles text, but for tables and relational databases.
2Three models do ICL for structured data: TabPFN (single flat tables, up to ~10K rows), KumoRFM (relational databases with multiple connected tables), and NICL/Neuralk (single tables for commerce). KumoRFM is the only one that handles relational data.
3The hard frontier is relational data. Flat tables are matrices. Relational databases are variable-structure graphs with different schemas, variable cardinalities, multi-hop patterns, and temporal ordering. This is why TabPFN stops at flat tables and KumoRFM requires a graph-based architecture.
4ICL on relational data is not just faster but more accurate. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features. The foundation model discovers cross-table patterns that human feature engineering misses.
5Traditional ML takes 6-12 weeks per prediction task. ICL takes seconds to minutes. This changes which prediction tasks are economically viable. Use cases too small for a full pipeline become feasible at ICL speed and cost.

In-Context Learning for Structured Data: How Foundation Models Predict Without Training