Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

AutoML vs Foundation Models: Why AutoML Can't Fix the Real Bottleneck

AutoML automates model selection. Foundation models automate feature discovery. These are different problems. Here's why solving the wrong one keeps your data science team stuck.

TL;DR

  • 1AutoML automates model selection (the last 20% of the ML pipeline). Foundation models automate feature discovery (the first 80%). Feature discovery is the harder problem and the bigger bottleneck.
  • 2On RelBench, AutoML + manual features scores ~64-66 AUROC. KumoRFM zero-shot scores 76.71. The 10+ point gap is the value of automating feature discovery across multi-table relational data, not model selection.
  • 3AutoML still requires a flat feature table as input. It cannot read relational databases, discover multi-hop patterns, or preserve temporal sequences. The 12.3 hours and 878 lines of feature engineering remain fully manual.
  • 4At scale (20 prediction tasks), AutoML + manual features costs $650K-$900K annually in data science time. A foundation model approach costs $80K-$120K. The 85% savings come from eliminating feature engineering.
  • 5AutoML has a role for single-table problems and mature feature stores. But for multi-table relational data, foundation models do not make AutoML better. They make it unnecessary.

AutoML was supposed to democratize machine learning. The pitch was compelling: upload your data, click a button, get a model. DataRobot, H2O, Google AutoML, and Amazon SageMaker Autopilot all promised to replace the ML expert with software.

The tools work. They do a genuinely good job of selecting the right model architecture, tuning hyperparameters, and building ensembles. On Kaggle-style benchmarks with clean, pre-engineered feature tables, AutoML platforms often match or beat what a mid-level data scientist produces.

But enterprise adoption has not matched the hype. Gartner reported in 2024 that while 75% of enterprises have evaluated AutoML, fewer than 20% use it as their primary ML workflow. The reason is simple: AutoML solves the wrong bottleneck.

ml_pipeline_time_breakdown

pipeline_stagetime_spent% of totalautomated_by_AutoMLautomated_by_FM
Data extraction & joining2.8 hours18%NoYes
Feature computation5.1 hours33%NoYes
Feature selection & iteration4.4 hours29%NoYes
Model selection & tuning1.8 hours12%YesYes
Evaluation & validation1.2 hours8%PartialPartial
Total15.3 hours100%12-20%80-92%

Highlighted: the first three stages (feature engineering) consume 80% of time. AutoML automates none of them. Foundation models automate all of them.

automl_vs_foundation_model_accuracy

approachAUROCwhat_it_automateshuman_hours_per_task
LightGBM + manual features62.44Nothing12.3
AutoML + manual features~64-66Model selection only10.5
AutoML + Featuretools~66-68Model selection + basic features4.2
KumoRFM zero-shot76.71Everything0.001
KumoRFM fine-tuned81.14Features + model + adaptation0.1

Highlighted: the 10+ AUROC point gap between AutoML approaches and KumoRFM is the difference between automating model selection and automating feature discovery. The harder problem yields the bigger improvement.

The ML pipeline has two bottlenecks

A standard enterprise ML pipeline has two labor-intensive stages:

  1. Feature engineering (joining tables, computing aggregations, encoding variables, building a flat feature table)
  2. Model selection and tuning (choosing an algorithm, tuning hyperparameters, building ensembles, evaluating results)

The Stanford RelBench study measured how data scientists spend their time: 80% on feature engineering (12.3 hours, 878 lines of code) and 20% on modeling. AutoML automates the 20%. Foundation models automate the 80%.

What AutoML actually does

To understand the gap, you need to be precise about what AutoML automates and what it leaves manual.

What AutoML automates

  • Algorithm selection. AutoML tries multiple model types (XGBoost, LightGBM, random forest, logistic regression, neural networks) and picks the best performer. A human would typically try 2-3 algorithms. AutoML tries 10-20.
  • Hyperparameter tuning. AutoML uses Bayesian optimization or grid search to find optimal hyperparameters (learning rate, tree depth, regularization). This saves a few hours of manual work.
  • Ensemble building. AutoML builds stacked ensembles that combine multiple models. This often yields a 1-3% accuracy improvement over any single model.
  • Basic preprocessing. Some AutoML tools handle missing values, one-hot encoding, and normalization automatically.

What AutoML does not automate

  • Table joins. AutoML cannot read a relational database with multiple tables. It needs a single flat table as input. Someone has to write the SQL to join customers, orders, products, and support tickets into one row per entity.
  • Feature computation. AutoML does not compute avg_order_value_last_90d or days_since_last_login. Those aggregations must already exist as columns in the input table.
  • Multi-hop pattern discovery. AutoML cannot discover that a customer's churn risk depends on the return rates of products they bought, because it never sees the products table.
  • Temporal sequence preservation. AutoML consumes a static feature table. The temporal dynamics (accelerating purchase frequency, declining engagement over weeks) are only present if someone pre-computed them as features.

What foundation models actually do

A relational foundation model like KumoRFM solves the problem that AutoML skips. It reads raw relational tables directly, without any feature engineering.

How it works

KumoRFM represents your database as a temporal heterogeneous graph. Each row in each table becomes a node. Each foreign key relationship becomes an edge. Timestamps are preserved as temporal attributes on nodes and edges.

what_automl_receives (flat feature table)

lead_idemails_openedpages_vieweddays_since_signupcompany_sizetitle_rank
L-301128455003 (VP)
L-302422302001 (Engineer)
L-303019050005 (CTO)

AutoML receives this pre-built flat table and searches for the best model to fit it. It tries XGBoost, LightGBM, neural nets, ensembles. It never sees the raw CRM tables underneath.

what_the_foundation_model_reads (raw relational tables)

tableexample_data_for_L-302signal_invisible_to_AutoML
contacts4 contacts from 3 departments activeMulti-threaded account engagement
activitiesBlog > Case study > API docs > Demo (in sequence)Buying-stage content progression
opportunitiesSimilar account closed $210K last quarterAccount similarity to past wins
accountsCompany raised Series B 30 days agoFirmographic momentum

The foundation model reads all four tables directly. It discovers that L-302 has a multi-threaded buying committee, a textbook content progression, and account similarity to past closed-won deals. None of these signals exist in the flat table AutoML receives.

A graph transformer processes this structure by passing messages along edges (foreign key relationships), learning which cross-table patterns are predictive. Multi-hop patterns (customer → orders → products → returns) are captured naturally because information propagates through the graph layer by layer.

Because KumoRFM is pre-trained on thousands of diverse databases, it has already learned the universal patterns that recur across relational data: recency effects, frequency dynamics, temporal decay, graph topology signals. At inference time, it applies these learned patterns to your database without any task-specific training.

AutoML

  • Requires flat feature table as input
  • Automates model selection and tuning
  • Cannot discover cross-table patterns
  • Cannot handle temporal sequences
  • Solves 20% of the pipeline

Foundation model (KumoRFM)

  • Reads raw relational tables directly
  • Automates feature discovery and modeling
  • Discovers multi-hop cross-table patterns
  • Preserves temporal dynamics natively
  • Solves 100% of the pipeline

The accuracy gap

The difference between these approaches shows up directly in accuracy. On the RelBench benchmark (7 databases, 30 tasks, 103 million rows):

ApproachAUROC (classification)What it automates
LightGBM + manual features62.44Nothing (fully manual)
AutoML + manual features~64-66 (estimated)Model selection only
KumoRFM zero-shot76.71Features + model + training
KumoRFM fine-tuned81.14Features + model (fine-tuning adds task adaptation)

AutoML can squeeze 2-4 AUROC points out of the same feature table that LightGBM uses, by trying more algorithms and better hyperparameters. But the gap between a well-tuned model on manual features (~64-66) and a foundation model on raw relational data (76.71) is over 10 points.

That 10-point gap is not about model architecture. It is about data. The foundation model sees the full relational structure. The AutoML model sees whatever features someone decided to build.

PQL Query

PREDICT conversion
FOR EACH leads.lead_id
WHERE leads.status = 'open'

One query to the foundation model replaces the entire AutoML pipeline: data extraction, feature engineering, model selection, hyperparameter tuning, and ensemble building. The model reads raw CRM tables directly.

Output

lead_idconversion_probapproach_comparisonaccuracy_delta
L-22010.84 (FM)0.71 (AutoML)+13 points
L-22020.23 (FM)0.38 (AutoML)FM correctly lower
L-22030.91 (FM)0.62 (AutoML)+29 points
L-22040.11 (FM)0.14 (AutoML)Both correctly low

cost_at_scale (20 prediction tasks)

cost_dimensionAutoML approachfoundation_modelsavings
Feature engineering hours210 hours0 hours210 hours
Model selection hours0 hours (automated)0 hours
Pipeline maintenance (annual)520 hours20 hours500 hours
Data scientist headcount needed3-4 FTEs0.5 FTE2.5-3.5 FTEs
Time to new prediction task2-4 weeksMinutes99%+ reduction
Total annual cost$650K-$900K$80K-$120K$570K-$780K

Highlighted: at 20 prediction tasks, the foundation model approach costs 85% less than AutoML + manual features. The savings come entirely from eliminating the feature engineering that AutoML leaves manual.

Why the difference matters at scale

For a single, well-defined prediction task with a dedicated data science team and months of time, AutoML provides modest value. The team builds the features, AutoML picks the model, and you save a few days of tuning.

But enterprises do not have one prediction task. They have dozens. Churn, upsell, cross-sell, fraud, credit risk, demand forecasting, personalization, campaign targeting. Each task needs its own feature engineering pipeline.

The cost arithmetic

With AutoML, each task still costs 12.3 hours of feature engineering. For 20 prediction tasks, that is 246 hours of senior data scientist time, roughly 6 person-weeks, just on feature engineering. AutoML saves maybe 20% on top of that (the modeling time), bringing total time to perhaps 260 hours instead of 310.

With a foundation model, each task costs seconds. For 20 prediction tasks, you spend less than a minute on predictions and the rest of your time on problem framing, evaluation, and deployment. The total time drops from 260 hours to maybe 20 hours of human work.

Where AutoML still has a role

AutoML is not useless. There are specific situations where it delivers real value:

  • Single-table problems. If your data is already in a flat table (no multi-table joins needed), AutoML skips the feature engineering bottleneck because there is no feature engineering to do. Kaggle-style classification on a single CSV is AutoML's sweet spot.
  • Mature feature stores. If your organization has already invested in a comprehensive feature store with hundreds of curated features, AutoML can efficiently select and tune models on those features. You have already paid the feature engineering cost.
  • Rapid prototyping on flat data. For quick experiments where the data is already flat and the goal is directional (not production accuracy), AutoML gives you an answer in minutes.

The fundamental difference

AutoML and foundation models solve different problems. AutoML asks: "Given this feature table, what is the best model?" Foundation models ask: "Given this database, what are the best predictions?"

The first question assumes that someone has already converted the raw relational data into features. The second question starts from raw data. The first question is a search over model configurations. The second is a search over the full relational pattern space.

If your bottleneck is model selection, AutoML is the right tool. But for most enterprises, the bottleneck has never been model selection. It is the 12.3 hours of feature engineering that come before the model ever sees the data.

Foundation models do not make AutoML better. They make it unnecessary. When the model reads raw relational data directly, there is no feature table to optimize over and no model selection to automate. The entire pipeline collapses into a single step: ask a question, get a prediction.

Frequently asked questions

What does AutoML automate?

AutoML automates model selection, hyperparameter tuning, and sometimes feature selection from a pre-built feature table. Tools like DataRobot, H2O, and Google AutoML take a flat table as input, try many model architectures (XGBoost, LightGBM, neural nets, ensembles), tune their parameters, and return the best-performing model. They automate the last 20% of the ML pipeline.

What does a foundation model automate?

A foundation model for relational data automates feature discovery, the first 80% of the ML pipeline that AutoML leaves manual. Models like KumoRFM read raw relational tables directly, discover predictive patterns across multiple tables, time windows, and relationship hops, and generate predictions without any feature engineering or model training.

Can I use AutoML and foundation models together?

In principle yes, but in practice a foundation model makes AutoML redundant for most use cases. AutoML's value is automating model selection, but a foundation model already includes the model. If you use KumoRFM, you skip both feature engineering and model selection. The main exception is if you want to use foundation model outputs (embeddings or predictions) as features in an ensemble with traditional models.

Does AutoML solve the feature engineering bottleneck?

No. AutoML tools require a pre-engineered flat feature table as input. They cannot read raw relational databases, discover multi-table patterns, or engineer features from joins and aggregations. The 12.3 hours and 878 lines of code that data scientists spend on feature engineering per task remain entirely manual when using AutoML.

Why is feature discovery harder than model selection?

Model selection is a finite search over a known set of architectures and hyperparameters. Feature discovery is a combinatorial search over all possible joins, columns, aggregations, time windows, and interactions in a relational database. For a database with 5 tables and 50 columns, there are 1,200+ possible first-order features and 700,000+ pairwise interactions. Model selection has perhaps 50-100 configurations to try. Feature discovery has an effectively infinite space.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.