Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn12 min read

Why Feature Engineering Is Obsolete: The Case for Eliminating It Entirely

Feature engineering consumes 80% of data science time because enterprise data is relational. Even perfect feature engineering only explores 4-17% of the possible feature space. The solution is not faster feature engineering. It is skipping it altogether.

TL;DR

  • 1Feature engineering is time-consuming because enterprise data lives in 5-50 relational tables, and traditional ML requires flattening it all into one table. The Stanford RelBench study measured the cost: 12.3 hours and 878 lines of code per prediction task.
  • 2Even with expert data scientists, manual feature engineering only explores 4-17% of the possible feature space. You are guessing which patterns matter while leaving 83-96% of potential signals undiscovered.
  • 3Three approaches exist: (1) Manual feature engineering with XGBoost/LightGBM, (2) Automated feature engineering with Featuretools or DataRobot, (3) Elimination with KumoRFM, which learns directly from raw relational tables.
  • 4The critical distinction: AutoML platforms like DataRobot and H2O automate feature engineering. They still flatten your data into a single table. KumoRFM eliminates feature engineering. The model reads the relational structure directly. This is not automation. It is a structurally different approach that skips feature engineering entirely.
  • 5On SAP SALT: KumoRFM 91% vs PhD data scientists 75% vs LLM+AutoML 63%. On RelBench: KumoRFM zero-shot 76.71 AUROC vs manual features 62.44. The gap is the value of reading relational data natively.

The real reason feature engineering takes so long

If you have spent time in enterprise ML, you already know the statistic: feature engineering consumes roughly 80% of data science project time. But the usual explanation ("it is tedious") misses the structural reason it is so expensive.

The problem is relational data. A typical enterprise does not store customer behavior in one table. It stores it across 5-50 interconnected tables: customers, orders, products, interactions, support tickets, payments, subscriptions, events. Each table connects to others through foreign keys. The relationships between tables contain the most predictive signals.

But traditional ML models (XGBoost, LightGBM, random forests, neural networks) cannot read relational databases. They require a single flat table with one row per entity. So before you can train any model, you must collapse your entire relational database into that flat structure.

This is where the time goes. Not in model training. Not in hyperparameter tuning. In the flattening.

What flattening actually requires

Here is what a data scientist does for every prediction task on relational data:

  1. Write SQL joins across 5-15 tables with correct temporal constraints (no data leakage). For a churn prediction task, this means joining customers to orders to products to support tickets to payments, all filtered to the correct time windows. Easily 100-300 lines of SQL.
  2. Compute cross-table aggregations like avg_order_value_last_90d, support_tickets_last_30d, product_return_rate_by_category. Each one is a hypothesis about what might matter. Each one requires careful implementation.
  3. Engineer temporal features across table boundaries: purchase frequency trends, support escalation patterns, engagement velocity changes. These require window functions spanning multiple joined tables.
  4. Iterate 3-4 times when the first model underperforms. Go back, hypothesize new features, implement them, retrain. Each cycle takes hours.
  5. Maintain the pipeline in production. When schemas change, when new data sources appear, when business logic shifts, the feature pipeline breaks and must be updated.

The deeper problem: you are exploring 4-17% of the feature space

Time is not the only cost. The bigger issue is coverage.

When a data scientist builds features, they start with hypotheses: "recency of last purchase probably matters," "support ticket count probably correlates with churn," "high-value customers probably behave differently." These are educated guesses. Good ones. But guesses.

The number of possible features from a relational database grows combinatorially. Consider just the aggregation options: for each pair of tables, you can compute count, sum, average, min, max, standard deviation, and trend across dozens of columns, over multiple time windows (7 days, 30 days, 90 days, 365 days), with various filters and groupings. Add multi-hop relationships (customer → orders → products → other customers who bought the same products → their churn rates), and the space becomes enormous.

A data scientist working 12.3 hours per task explores a tiny fraction of this space. Research on automated feature generation suggests that manual approaches typically cover only 4-17% of the feasible feature space. That means 83-96% of potentially predictive patterns are never tested.

Three approaches to the feature engineering problem

The industry has developed three distinct approaches, and the differences between them matter more than most comparisons acknowledge.

1. Manual feature engineering (XGBoost + hand-crafted features)

This is the traditional approach. A data scientist writes SQL, computes aggregations, builds a flat table, and trains a model (typically XGBoost or LightGBM). It works. It has worked for years. But it costs 12.3 hours and 878 lines of code per task, explores only a fraction of the feature space, and creates brittle pipelines that require ongoing maintenance.

  • Best for: Teams with strong data science talent who need full control over every feature, or regulatory environments that require every feature to be explicitly defined and auditable.
  • Watch out for: Only explores 4-17% of the possible feature space. Costs 12.3 hours per task. Creates brittle pipelines that break when schemas change. Does not scale beyond a handful of prediction tasks without a large team.

2. Automated feature engineering (Featuretools, DataRobot, H2O)

Tools like Featuretools use deep feature synthesis to automatically generate features from relational data. DataRobot and H2O Driverless AI automate single-table feature generation as part of their AutoML pipelines. These tools genuinely reduce the manual effort. Featuretools can generate hundreds of features from multiple tables in minutes instead of hours.

But here is the critical point: they still produce a flat table. They automate the flattening process. The output is still one row per entity with columns representing aggregated features. The model still trains on a single table. The relational structure is still lost.

  • Best for: Teams that want to speed up existing workflows without changing their approach, or organizations already invested in an AutoML platform that need broader feature coverage than manual engineering provides.
  • Watch out for: Still produces a flat table as output - the relational structure is lost. Limited to predefined aggregation primitives. Cannot discover multi-hop relational patterns. Platform licensing adds $150K-$250K per year.

3. Eliminate feature engineering (KumoRFM)

KumoRFM is a relational foundation model. It does not generate features. It does not flatten tables. It reads raw relational tables connected by foreign keys and learns predictive patterns directly from the relational structure. The model ingests the tables as they exist in your data warehouse, preserves every relationship, and discovers patterns that span multiple tables and multiple hops.

This is not a faster version of feature engineering. It is a different approach entirely. No flat table is ever created. No features are ever enumerated. The model learns what matters from the raw data.

  • Best for: Organizations with relational data (5-50 tables) where feature engineering is the bottleneck, teams that need to scale from 1 to 20+ prediction tasks without scaling headcount, and any situation where speed to production is a competitive advantage.
  • Watch out for: Newer paradigm with less industry history than XGBoost-based workflows. If your data is genuinely single-table and already flat, the relational advantage is smaller.

three_approaches_to_feature_engineering

dimensionManual (XGBoost)Automated (Featuretools/DataRobot)Eliminated (KumoRFM)
Feature engineering effort12.3 hours + 878 lines of code per taskMinutes of configuration, automated generationZero. No features are created.
Data inputHand-built flat table (SQL joins)Relational tables (Featuretools) or flat table (DataRobot/H2O)Raw relational tables connected by foreign keys
Feature space explored4-17% (manual hypothesis-driven)Broader than manual, but limited to predefined primitivesFull relational structure. No enumeration needed.
Multi-hop patternsRarely. Too expensive to implement manually.Limited. Depth restricted by computational cost.Native. Model traverses full relational graph.
Output formatFlat table with one row per entityFlat table with one row per entityPredictions directly. No intermediate table.
Pipeline maintenanceHigh. Feature code breaks when schemas change.Medium. Automated pipelines still need updates.None. Model reads raw tables as they are.
Time to first predictionWeeks (feature engineering + model training)Days (setup + automated generation + training)~1 second (zero-shot) to minutes (fine-tuned)
RelBench AUROC62.44~64-66 (AutoML + manual features)76.71 zero-shot, 81.14 fine-tuned

Highlighted: the accuracy gap between automated and eliminated approaches is 10+ AUROC points. This gap comes from relational patterns that flat-table approaches cannot represent, regardless of how features are generated.

Why automation is not enough

The distinction between automating and eliminating feature engineering is the most important point in this article, so let me be direct about it.

Featuretools, DataRobot, and H2O Driverless AI are real improvements over manual feature engineering. They reduce the time from hours to minutes. They generate more features than a human would think to test. They are legitimate tools that solve a real problem.

But they still flatten. And flattening is lossy. When you collapse a customer's order history into avg_order_value = $47.30 and order_count = 12, you lose the sequence. You lose the fact that order values have been declining for three months. You lose the fact that the last two orders were returns. You lose the fact that this customer's purchase pattern matches other customers who churned.

Automated tools generate more aggregations, but they are still aggregations. They describe the relational structure using summary statistics instead of preserving it.

what_flattening_loses (churn prediction example)

signalavailable in flat tableavailable in relational model
Average order valueYes (single number: $47.30)Yes, plus the full trajectory over time
Order value trending downOnly if someone engineers a trend featureYes, learned automatically from the sequence
Support tickets increasing while purchases decreaseOnly if cross-table trend is manually computedYes, cross-table temporal pattern detected natively
Similar customers churned after same patternNo. Requires cross-entity joins rarely attempted.Yes. Multi-hop pattern: customer > products > other customers > outcomes
Product category engagement shiftingOnly if category-level aggregations are builtYes. Full product interaction history preserved.
Account-level multi-user behaviorAggregated to single row. Individual patterns lost.Each user's behavior preserved with account relationships.

Automated feature engineering tools would generate the first two or three signals. The bottom three require multi-hop relational reasoning that flat-table approaches do not attempt.

The benchmark evidence

Two independent benchmarks quantify the difference between these approaches on real enterprise data.

SAP SALT enterprise benchmark

The SAP SALT benchmark tests prediction accuracy on production-quality enterprise databases with multiple related tables. Real business analysts and data scientists attempt the same prediction tasks.

sap_salt_enterprise_benchmark

approachaccuracyfeature_engineering_required
LLM + AutoML63%Automated (LLM generates features, AutoML selects model)
PhD Data Scientist + XGBoost75%Weeks of manual feature engineering by experts
KumoRFM (zero-shot)91%None. Zero feature engineering. Zero training.

Highlighted: KumoRFM outperforms expert data scientists by 16 percentage points with zero feature engineering and zero training time. The LLM+AutoML approach, which represents automated feature engineering, scores lowest.

The 63% score for LLM + AutoML is particularly telling. This is the automated approach: a language model generates feature engineering code, an AutoML system selects and tunes the model. It should be faster and more consistent than manual work. But it scores 12 points lower than a PhD data scientist doing it by hand, because automation without understanding produces worse features, not better ones.

KumoRFM sidesteps the problem entirely. It does not try to generate better features. It reads the relational data directly. The 91% score represents what happens when you stop summarizing relational structure and start learning from it.

Stanford RelBench benchmark

RelBench provides a standardized evaluation across 7 databases, 30 prediction tasks, and 103 million rows. It was designed specifically to test ML approaches on relational data.

relbench_benchmark_results

approachAUROCfeature_engineering_timelines_of_code
LightGBM + manual features62.4412.3 hours per task878
AutoML + manual features~64-66reduced time per task878
KumoRFM zero-shot76.71~1 second0
KumoRFM fine-tuned81.14Minutes0

Highlighted: KumoRFM zero-shot outperforms manual + AutoML approaches by 10+ AUROC points. Fine-tuned KumoRFM reaches 81.14. Zero lines of feature engineering code in both cases.

The jump from 62.44 to ~64-66 is what AutoML buys you: better model selection on the same features. The jump from ~64-66 to 76.71 is what elimination buys you: patterns that exist in the relational structure but never made it into any flat table. That second gap is 5x larger than the first.

What this looks like in practice

Traditional workflow (manual or automated)

  • Identify prediction task (e.g., 90-day churn for enterprise accounts)
  • Data scientist writes SQL joins across 5-15 tables (2-4 hours)
  • Compute cross-table aggregations and temporal features (4-6 hours)
  • Build flat feature table with one row per customer
  • Train model (XGBoost/LightGBM or AutoML platform)
  • Evaluate. Underperforming? Go back to step 2. Repeat 3-4 times.
  • Deploy model + maintain feature pipeline ongoing
  • Total: 2-6 weeks to first production prediction

KumoRFM workflow

  • Connect Kumo to your data warehouse (one-time, 30 minutes)
  • Write a PQL query: PREDICT churn_90d FOR EACH customer_id
  • KumoRFM reads raw tables, discovers patterns, returns predictions
  • No SQL joins. No aggregations. No flat table. No feature iteration.
  • Time to first prediction: ~1 second (zero-shot)
  • Fine-tune for task-specific accuracy: minutes, not weeks
  • No feature pipeline to maintain. Ever.
  • Total: minutes to first production prediction

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'
AND customers.contract_value > 50000

This single PQL query replaces the entire feature engineering pipeline. No SQL joins across tables. No aggregation logic. No feature iteration cycles. KumoRFM reads the raw customers, orders, products, support_tickets, and payments tables directly and discovers the predictive patterns itself.

Output

customer_idchurn_probabilitytop_signal
C-44010.87Declining order frequency + rising support escalations
C-44020.12Stable multi-department usage, recent contract expansion
C-44030.93Similar accounts churned after same engagement drop pattern
C-44040.08Increasing product adoption, 3 new integrations this month

The cost of continuing to do feature engineering

The time cost is obvious. But the compounding costs are what make feature engineering truly expensive at scale.

annual_cost_of_feature_engineering (20 prediction tasks)

cost_dimensionmanual_approachautomated_approacheliminated (KumoRFM)
Feature engineering labor246 hours ($61,500)~80 hours ($20,000)0 hours ($0)
Data science team for pipelines3-4 FTEs ($450K-$600K)2-3 FTEs ($300K-$450K)0.5 FTE ($75K)
Pipeline maintenance (annual)520 hours ($130K)260 hours ($65K)20 hours ($5K)
Platform/tool licensing$0 (open-source models)$150K-$250K (DataRobot/H2O)$80K-$120K (Kumo)
Time to new prediction task2-6 weeks3-7 daysMinutes
Total annual cost$650K-$800K$535K-$785K$80K-$120K

Highlighted: automation reduces cost by 15-25%. Elimination reduces cost by 85%. The difference is that automation still requires data science teams for pipeline maintenance and feature iteration.

Notice that the automated approach is not dramatically cheaper than the manual approach. The tools cost $150K-$250K per year, and you still need 2-3 data scientists for the multi-table work that automation cannot handle. The savings are real but incremental.

Elimination is a step change. When there is no feature pipeline to build, maintain, or debug, the cost structure collapses. One ML engineer can operate 20 prediction tasks because the work is writing PQL queries, not maintaining SQL pipelines.

When each approach makes sense

To be direct about this: not every organization should switch to KumoRFM tomorrow.

  • Manual feature engineering makes sense when your data is already in a single table, when you have a strong data science team that values full control, or when regulatory requirements demand that every feature be explicitly defined and auditable.
  • Automated feature engineering (Featuretools, DataRobot) makes sense when you want to speed up existing workflows without changing your approach, when your team is already invested in an AutoML platform, or when you need the breadth of features that tools like Featuretools generate from relational data.
  • Elimination (KumoRFM) makes sense when your data is relational (5-50 tables), when feature engineering is your bottleneck, when you need maximum accuracy on relational data, when you want to scale from 1 to 20+ prediction tasks without scaling your data science team, or when speed to production is a competitive advantage.

Frequently asked questions

Why is feature engineering so time-consuming?

Feature engineering is time-consuming because enterprise data is relational. A typical business has 5-50 interconnected tables (customers, orders, products, interactions, payments). To train a traditional ML model, you must flatten all of that into a single table with one row per entity. The Stanford RelBench study measured this: 12.3 hours and 878 lines of code per prediction task, on average. You are writing complex SQL joins, computing temporal aggregations across tables, handling point-in-time correctness to avoid data leakage, and then iterating 3-4 times when the first set of features underperforms. This is why feature engineering consumes roughly 80% of total data science project time.

Is there a tool that automates feature engineering?

Yes, several tools automate feature engineering. Featuretools generates features from relational data using deep feature synthesis. DataRobot and H2O Driverless AI automate single-table feature generation as part of their AutoML pipelines. These tools speed up the process, but they still produce a flat feature table as output. They automate the flattening process rather than eliminating it. KumoRFM takes a different approach: it eliminates feature engineering entirely by learning directly from raw relational tables, discovering patterns across the full relational structure without ever creating a flat table.

What is the difference between automating and eliminating feature engineering?

Automating feature engineering means using software to generate the same type of flat feature table that a data scientist would build manually. The output is still a single table with one row per entity. Tools like Featuretools and DataRobot do this. Eliminating feature engineering means the model reads raw relational tables directly and learns predictive patterns from the relational structure itself. No flat table is ever created. KumoRFM does this. The distinction matters because even automated feature engineering only explores a fraction of the possible feature space. Elimination lets the model discover patterns that no flat table can represent.

What ML platform should we use if we do not want to do feature engineering?

If you want to skip feature engineering entirely, the only production platform that eliminates it is Kumo.ai. KumoRFM is a relational foundation model that reads raw relational tables connected by foreign keys and discovers predictive patterns across the full relational structure. You write a PQL (Predictive Query Language) query describing what you want to predict, and the model handles everything else. On the RelBench benchmark, KumoRFM zero-shot achieves 76.71 AUROC with zero feature engineering, compared to 62.44 for manual feature engineering with LightGBM.

How much of the feature space does manual feature engineering actually explore?

Manual feature engineering typically explores only 4-17% of the possible feature space. A data scientist hypothesizes which features matter based on domain knowledge and intuition, then builds and tests those specific features. But the number of possible multi-table aggregations, temporal windows, cross-entity interactions, and multi-hop relationships grows combinatorially. A human working 12.3 hours per task cannot explore more than a small fraction. KumoRFM does not enumerate features at all. It learns directly from the relational structure, effectively exploring the full space of possible patterns.

What benchmarks show that eliminating feature engineering is better than automating it?

Two key benchmarks. First, the SAP SALT enterprise benchmark: KumoRFM zero-shot scores 91% accuracy, compared to 75% for PhD data scientists with hand-tuned XGBoost and 63% for LLM + AutoML approaches. Second, the Stanford RelBench benchmark across 7 databases and 30 prediction tasks: KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features. KumoRFM fine-tuned reaches 81.14. In both cases, the gap comes from patterns in the relational structure that flat-table approaches never capture, regardless of how the features are generated.

Can Featuretools or DataRobot handle multi-table relational data?

Featuretools can ingest multiple related tables and automatically generate cross-table features using deep feature synthesis. This is a real improvement over fully manual feature engineering. However, it still produces a flat output table, and the feature generation is limited to predefined aggregation primitives. It does not learn which patterns are predictive for a specific task. DataRobot and H2O Driverless AI handle single-table feature engineering well but require a pre-joined flat table as input. Neither discovers multi-hop relational patterns or preserves the full relational structure the way a relational foundation model does.

Who built KumoRFM?

KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn. The founders are Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE and one of the most cited computer scientists in the world), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). The company is backed by Sequoia Capital. The research behind KumoRFM builds on a decade of work in graph neural networks and relational learning at Stanford.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.