If you have used ChatGPT or Claude, you have already experienced in-context learning. You paste a passage of text the model has never seen, ask a question about it, and get a correct answer instantly. The model did not train on that passage. It did not fine-tune. It recognized patterns from its pre-training and applied them to new content in a single forward pass.
Now apply that same idea to a database table. You give a pre-trained model a dataset it has never seen (say, your customer churn data), and it returns predictions without any training. No feature engineering. No hyperparameter tuning. No waiting for a training loop to converge. Just predictions, in seconds.
That is in-context learning for structured data. It is real, it works today, and it changes the economics of enterprise ML.
How in-context learning works for structured data
The mechanics are straightforward, even if the engineering behind them is not. An ICL model for structured data is pre-trained on a large corpus of datasets (thousands to millions of them). During pre-training, the model learns general patterns about how structured data behaves: what features predict outcomes, how tables relate to each other, what temporal patterns look like across different domains.
When you give it a new dataset at inference time, the model does not update its weights. It processes your data in a single forward pass, matches patterns from pre-training, and outputs predictions. This is different from traditional ML in a critical way: every new dataset requires a fresh training loop with gradient updates.
- Pre-training phase (done once): The model trains on thousands or millions of diverse datasets. For TabPFN, these are synthetic tabular datasets. For KumoRFM, these are real-world relational databases with multiple connected tables. The model learns general prediction patterns that transfer across domains.
- Inference phase (per new dataset): You pass your new dataset as input. The model recognizes which pre-trained patterns apply and generates predictions. No gradient updates. No training loop. One forward pass.
- Optional fine-tuning: For maximum accuracy on a specific dataset, you can fine-tune the model with a small number of gradient updates. This is faster than training from scratch and typically improves accuracy by 3-5 AUROC points over zero-shot.
The three models that do ICL for structured data
As of 2026, three models can perform genuine in-context learning on structured data. They differ significantly in what kind of structured data they handle and how they were built.
icl_models_structured_data_comparison
| model | developer | data_type | pre-training_data | max_input_size | handles_relational_data |
|---|---|---|---|---|---|
| TabPFN | University of Freiburg | Single flat tables | Millions of synthetic tabular datasets | ~10,000 rows, ~100 features | No (single table only) |
| KumoRFM | Kumo.ai | Relational databases (multiple connected tables) | Tens of thousands of real-world relational databases | Enterprise-scale relational graphs | Yes (multiple tables with foreign keys) |
| NICL (Neuralk) | Neuralk | Single flat tables | Commerce and marketing datasets | Moderate (single table) | No (single table only) |
Three ICL models for structured data. TabPFN and NICL handle single flat tables. KumoRFM is the only model that handles relational databases with multiple connected tables.
TabPFN: in-context learning on flat tables
TabPFN was the first model to demonstrate that in-context learning works for tabular data. Developed at the University of Freiburg, it was trained on millions of synthetic classification datasets generated by sampling from a prior over data-generating processes.
The result: you feed TabPFN a new flat table (features and labels for training rows, features only for test rows), and it returns class predictions in a single forward pass. No hyperparameter tuning. No model selection. On small to medium datasets (under 10,000 rows), TabPFN matches or beats tuned XGBoost and random forest. That is a genuine achievement.
The limitations are clear: TabPFN works on single flat tables only. Enterprise data does not live in single flat tables. If your churn prediction requires joining customer, order, support ticket, and usage tables, you must flatten them yourself before TabPFN can use them. That flattening step is exactly the feature engineering bottleneck that ICL was supposed to eliminate.
KumoRFM: in-context learning on relational data
KumoRFM extends in-context learning to relational databases. Instead of requiring a pre-flattened table, it takes multiple connected tables as input, understands their schema and foreign key relationships, and makes predictions that incorporate cross-table patterns.
This is a harder problem than flat-table ICL by a significant margin. A flat table is a fixed-size matrix. A relational database is a variable-structure graph: different numbers of tables, different schemas per table, different cardinalities in relationships (one customer has 3 orders, another has 300), temporal ordering across tables. KumoRFM handles this by representing relational data as a heterogeneous graph and using graph neural network architectures designed for variable-structure inputs.
The practical impact: you point KumoRFM at your Snowflake or data warehouse, write a PQL query like PREDICT churned_30d FOR EACH customers.customer_id, and get predictions in seconds. No joins. No feature engineering. No training. The model reads your relational structure directly.
NICL (Neuralk): in-context learning for commerce
NICL takes a more specialized approach. Rather than targeting general tabular or relational data, it focuses on commerce and marketing prediction tasks: purchase propensity, customer segmentation, campaign response prediction. It operates on single flat tables, similar to TabPFN, but is optimized for the feature distributions and patterns common in commerce data.
The tradeoff is scope. NICL may outperform TabPFN on commerce tasks specifically, but it does not generalize to arbitrary tabular problems or handle relational data.
Why relational data is the hard frontier
The jump from flat-table ICL to relational ICL is not incremental. It is a structurally different problem. Here is why:
- Variable structure. A flat table has a fixed schema: N rows by M columns. Every dataset has the same shape (a matrix). A relational database has a variable number of tables, each with different schemas, connected by different foreign key patterns. The model must handle any relational structure, not just matrices.
- Variable cardinality. In a relational database, one customer might have 3 orders and another might have 3,000. One product might have 10 reviews and another might have 10,000. The model must aggregate variable-length relationships without losing signal.
- Multi-hop patterns. The most predictive patterns in relational data often span multiple tables. A customer who churns might show declining order frequency (orders table), increasing support tickets (tickets table), and decreasing product diversity (order items table). These cross-table signals require the model to reason across 3-4+ table hops.
- Temporal ordering. Relational data has timestamps scattered across multiple tables. The model must respect temporal causality: a churn prediction at time T can only use data from before time T, even when that data is spread across 5 different tables with different temporal granularities.
This is why TabPFN and NICL stop at flat tables. Handling relational structure requires a different architecture (graph neural networks vs transformers on matrices) and a different pre-training strategy (real relational databases vs synthetic tables).
flat_table_icl_vs_relational_icl
| dimension | flat-table ICL (TabPFN, NICL) | relational ICL (KumoRFM) |
|---|---|---|
| Input format | Single table (N rows x M columns) | Multiple connected tables with foreign keys |
| Pre-training data | Synthetic tables (TabPFN) or domain-specific tables (NICL) | Tens of thousands of real-world relational databases |
| Feature engineering required | Must flatten relational data into one table first | None. Reads relational structure directly. |
| Cross-table patterns | Cannot discover (only sees one table) | Discovers automatically across all connected tables |
| Max input size | ~10K rows, ~100 features (TabPFN) | Enterprise-scale relational databases |
| Enterprise readiness | Research stage. Works on small datasets. | Production. Deployed at Fortune 500 companies. |
| Task types | Classification (TabPFN). Commerce classification (NICL). | Classification, regression, ranking, recommendation across any relational domain. |
Flat-table ICL handles the simple case well. Relational ICL handles the case that actually matters for enterprises, where data lives across multiple connected tables.
The benchmark evidence
In-context learning is not just faster. On relational data, it is actually more accurate than traditional approaches. This is counterintuitive until you understand why: manual feature engineering on relational data typically captures only a fraction of the available cross-table signal. A foundation model pre-trained on thousands of relational databases discovers patterns that human engineers miss.
relbench_icl_benchmark
| approach | AUROC | time_to_prediction | feature_engineering |
|---|---|---|---|
| LightGBM + manual features | 62.44 | 12.3 hours per task | 878 lines of code per task |
| KumoRFM zero-shot (ICL) | 76.71 | ~1 second | None |
| KumoRFM fine-tuned | 81.14 | Minutes | None |
RelBench benchmark across 7 databases and 30 prediction tasks. KumoRFM zero-shot (pure ICL, no fine-tuning) outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning adds another 4.4 points.
On the SAP SALT enterprise benchmark:
sap_salt_icl_benchmark
| approach | accuracy | uses_ICL |
|---|---|---|
| LLM + AutoML | 63% | No (trains from scratch) |
| PhD Data Scientist + XGBoost | 75% | No (trains from scratch with manual features) |
| KumoRFM (zero-shot) | 91% | Yes (in-context learning on relational data) |
SAP SALT benchmark. KumoRFM's in-context learning on relational data outperforms both traditional ML and LLM-assisted approaches by wide margins.
Why this matters for enterprise ML teams
The traditional enterprise ML pipeline looks like this: a business team identifies a prediction need (churn, fraud, demand forecast). A data science team spends 2-4 weeks on feature engineering. They spend another 2-4 weeks on model training, tuning, and validation. Deployment takes another 2-4 weeks. Total: 6-12 weeks from request to production prediction.
With in-context learning on relational data, that timeline collapses. You write a PQL query. The model returns predictions in seconds. If you want to fine-tune for maximum accuracy, that takes minutes to hours, not weeks. A prediction that used to require a quarter of data science time now takes an afternoon.
This is not just a speed improvement. It changes what is economically viable. Use cases that were too small to justify a 6-week pipeline (predicting churn for a specific product line, scoring fraud risk for a new market, forecasting demand for a seasonal category) become feasible when the cost drops from weeks to seconds.
Traditional ML pipeline (per prediction task)
- Identify prediction target and relevant tables (1-2 weeks)
- Join tables, engineer features, handle temporal windowing (2-4 weeks, 878 lines of code)
- Select model, tune hyperparameters, train (1-2 weeks)
- Validate, test, deploy to production (2-4 weeks)
- Total: 6-12 weeks, 3-4 data scientists
- Repeat from scratch for every new prediction task
In-context learning with KumoRFM
- Connect to data warehouse (one-time setup)
- Write PQL: PREDICT churned_30d FOR EACH customers.customer_id
- Model reads relational tables, returns predictions in seconds
- Optional: fine-tune for maximum accuracy (minutes to hours)
- Total: minutes to hours, 1 ML engineer or analyst
- New prediction tasks take the same amount of time
PQL Query
PREDICT churned_30d FOR EACH customers.customer_id
One PQL query triggers in-context learning on your full relational database. KumoRFM reads customers, orders, support tickets, usage logs, and any other connected tables. It discovers cross-table churn patterns and returns predictions without any training, feature engineering, or pipeline code.
Output
| customer_id | churn_probability | key_signals |
|---|---|---|
| CUST-2201 | 0.89 | Order frequency down 70% (orders table), 4 support tickets in 14 days (tickets table), usage down 55% (usage table) |
| CUST-2202 | 0.14 | Stable order cadence, recent product expansion, no support escalations |
| CUST-2203 | 0.76 | Payment failures (billing table), reduced login frequency (sessions table), competitor product in tech stack (integrations table) |
| CUST-2204 | 0.08 | Increasing usage, active API calls, recent feature adoption |
When to use ICL vs fine-tuning vs traditional ML
In-context learning is not always the right answer. Here is a practical decision framework:
when_to_use_icl_vs_finetuning_vs_traditional
| scenario | best_approach | why |
|---|---|---|
| Need predictions fast (hours, not weeks) | ICL (KumoRFM zero-shot) | Predictions in seconds. No pipeline to build. |
| Exploring whether a prediction task is feasible | ICL (KumoRFM zero-shot or TabPFN) | Get a baseline prediction in minutes to validate the use case before investing in a full pipeline. |
| Production use case, maximum accuracy needed | Fine-tuned foundation model (KumoRFM fine-tuned) | Fine-tuning adds 3-5 AUROC points over zero-shot. Still faster than traditional ML. |
| Small, flat dataset with clean labels | TabPFN or tuned XGBoost | Both work well on small flat tables. TabPFN is faster. XGBoost may be slightly more accurate with tuning. |
| Relational data (multiple connected tables) | KumoRFM (zero-shot or fine-tuned) | Only option that reads relational structure natively. Flat-table models require manual flattening that loses signal. |
| Highly regulated domain requiring full explainability | Traditional ML (XGBoost + SHAP) | ICL models have improving but still limited explainability. If regulators require feature-level explanations, traditional models have an edge. |
Decision framework for ICL vs alternatives. ICL wins on speed and relational data. Traditional ML wins when you have a simple flat dataset and weeks to tune.