Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

What is a Data Science Agent? The 2026 Landscape

Data science agents automate parts or all of the data science workflow. The market has split into three categories: code-generating agents, no-code prediction platforms, and foundation model agents. Only one category understands relational data structure natively. Here's how to evaluate them.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2Data science agents are AI systems that automate the data science workflow. The market ($7.1B in 2025, projected $54.8B by 2032) breaks into three categories: code-generating agents, no-code prediction platforms, and foundation model agents.
  • 3Code-generating agents (Databricks Genie, Sphinx, Google DS-STAR) write notebook code faster but still operate on flat tables. They automate the workflow, not the data science reasoning about multi-table relationships.
  • 4No-code platforms (Julius AI, Akkio, Obviously AI) let business users build simple models without code. They work for single-table analytics but cannot handle relational databases or complex feature engineering.
  • 5Foundation model agents (Kumo) understand relational data structure natively. KumoRFM scores 76.71 AUROC zero-shot on RelBench vs. 62-66 for flat-table approaches, because it reads multi-table data directly instead of requiring manual joins.

The term "data science agent" entered mainstream vocabulary in 2025 when Databricks, Google, and a wave of startups all shipped AI systems that could write data science code, build models, or deliver predictions with minimal human intervention. The AI agents market hit $7.1 billion in 2025 and is projected to reach $54.8 billion by 2032 at a 33.9% CAGR. Forty percent of Global 2000 companies are expected to use AI agents by 2026.

But "data science agent" means very different things depending on who is selling it. Some agents write Python code in a notebook. Some build drag-and-drop models from a CSV. Some read your relational database and deliver predictions directly. These are not incremental differences. They represent fundamentally different philosophies about what should be automated.

The headline result: SAP SALT benchmark

The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.

KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

Three types of data science agents

The data science agent landscape breaks into three distinct categories, each automating a different part of the workflow:

  1. Code-generating agents that write Python, SQL, and notebook code for you (Databricks Genie, Sphinx, Google DS-STAR). These target data scientists and automate the typing, not the thinking.
  2. No-code prediction platforms that let business users build models without writing code (Julius AI, Akkio, Obviously AI). These target analysts and automate simple, single-table predictions.
  3. Foundation model agents that understand relational data structure and deliver predictions directly (Kumo). These target the prediction itself and automate the entire pipeline from raw tables to output.

data_science_agent_landscape_2026

agentapproachdata_typesautonomousmulti_tableproduction_ready
Kumo AIFoundation model + agentRelational (multi-table)Full pipelineYes (native)Yes
Databricks GenieCode generationFlat tables/notebooksAssistedManual joinsYes
SphinxJupyter copilotFlat tables/notebooksAssistedManual joinsEarly
Google DS-STARMulti-agent frameworkFlat tablesAutonomousManual joinsResearch
Julius AIChat-to-analysisCSV/connectionsAssistedNoYes (analytics only)
AkkioNo-code drag-dropCSV uploadAssistedNoYes (simple models)

Highlighted: Kumo is the only agent that natively understands multi-table relational data. All other agents operate on flat tables, requiring manual joins to combine information across tables.

Code-generating agents

Code-generating agents are the most visible category in 2026. They sit inside notebooks and IDEs, watch what you are doing, and write Python or SQL code to help. The pitch: a senior data scientist's productivity, available to everyone.

Databricks Genie Code

Databricks Genie Code is integrated into Databricks notebooks. It claims a 2x success rate over leading coding agents on data science tasks and a 60-80% reduction in processing time. It generates Python and SQL, executes it in the notebook, and iterates based on results. For teams already on Databricks, it is the most frictionless entry point.

Sphinx

Sphinx raised $9.5 million in seed funding from Lightspeed and Bessemer Venture Partners. It is Jupyter-native, meaning it works inside the existing notebook workflow that most data scientists already use. It generates code cells, explains its reasoning, and can iterate on errors.

Google DS-STAR

Google's DS-STAR is a multi-agent framework where specialized agents plan, code, and verify data science tasks. It represents the most autonomous approach in the code-generating category, with agents that can decompose complex tasks into subtasks and verify their own outputs. It remains a research project for now.

The limitation of code-generating agents

Code-generating agents make data scientists faster at writing code. But the bottleneck in enterprise data science is not typing speed. It is knowing which tables to join, which features to compute, and which temporal patterns matter. A code-generating agent can write a LEFT JOIN in seconds, but it cannot tell you whether that join captures the right signal for your prediction task.

These agents still operate on flat tables. They generate code that processes one table at a time, and they rely on the human to specify the multi-table logic. The feature engineering bottleneck (12.3 hours, 878 lines of code per task on RelBench) becomes faster to type but no less intellectually demanding.

No-code prediction platforms

No-code platforms take the opposite approach: instead of making data scientists faster, they try to eliminate the need for data scientists entirely by giving business users point-and-click model building.

Julius AI

Julius AI offers a chat interface where users describe what they want to predict, upload a CSV or connect to a data source, and get a model back. Pricing ranges from free to $70/month, and it holds SOC 2 Type II certification. It works well for exploratory analytics and simple predictions on a single dataset.

Akkio

Akkio starts at $49/month and provides a no-code drag-and-drop interface for building predictive models. Users upload a CSV, select a target column, and Akkio trains and deploys a model. It is designed for marketing teams, small businesses, and agencies that need quick predictions without a data science team.

The limitation of no-code platforms

No-code platforms work on a single flat table. They cannot read a relational database with customers, orders, products, and support tickets linked by foreign keys. They cannot discover that a customer's churn risk depends on the return rate of products they purchased, because they never see the products table.

For enterprise use cases where the predictive signal lives in the relationships between tables, not within a single table, no-code platforms miss the most important patterns. They are useful for quick analytics on flat exports, but they cannot replace a production ML pipeline.

Foundation model agents

Foundation model agents represent a fundamentally different approach. Instead of generating code or building simple models, they understand relational data structure natively and deliver predictions directly.

Kumo's agent is built on KumoRFM, a foundation model pre-trained on thousands of diverse relational databases. It represents your database as a temporal heterogeneous graph, where each row becomes a node, each foreign key becomes an edge, and timestamps are preserved as temporal attributes. A graph transformer processes this structure, learning which cross-table patterns are predictive.

The result: you describe what you want to predict in a single query, and the model reads your raw relational tables, discovers the relevant features, and returns predictions. No code, no flat tables, no manual feature engineering.

Code-generating agent workflow

  • Agent writes SQL to join tables into flat table
  • Agent writes Python to compute features
  • Agent writes code to train and tune models
  • Human reviews, iterates, and debugs each step
  • Output: a model trained on a manually-defined feature table

Foundation model agent workflow

  • User describes prediction task in one query
  • Agent reads raw relational tables directly
  • Model discovers cross-table features automatically
  • Predictions returned in seconds, no iteration needed
  • Output: predictions from the full relational data structure

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'

One PQL query replaces the entire code-generating agent workflow. No notebook code, no flat table construction, no model selection. The foundation model reads raw relational tables (customers, orders, support_tickets, product_usage) and delivers predictions in 1 second.

Output

customer_idchurn_probtop_signaltime_to_predict
C-44010.87Support tickets up 3x, product usage down 40%1 sec
C-44020.12Expanding seats, high feature adoption1 sec
C-44030.64Contract renewal in 30d, declining engagement1 sec
C-44040.03Multi-department usage, recent expansion1 sec

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing, 100 means perfect prediction. Moving from 65 to 77 AUROC means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%.

time_and_accuracy_comparison

approachtime_to_predictionAUROCmulti_table_supporthuman_hours_per_task
Code-generating agent + flat tableHours to days~62-66Manual joins (agent-written)4-8
No-code platform + CSVMinutes~55-62None0.5-1
KumoRFM zero-shot1 second76.71Native (automatic)0.001
KumoRFM fine-tunedMinutes (tuning) + 1 second81.14Native (automatic)0.1

Highlighted: KumoRFM delivers higher accuracy in less time because it reads relational data directly. The 10+ AUROC point gap over code-generating agents reflects the value of understanding data structure, not just writing code faster.

What to look for in a data science agent

Not all data science agents solve the same problem. When evaluating agents for enterprise use, these are the criteria that separate tools that speed up the workflow from tools that transform it:

  • Multi-table relational data support. Does the agent read multiple related tables natively, or does it require someone to flatten the data first? If your predictive signal lives in relationships between tables (and in enterprise data, it almost always does), this is the most important criterion.
  • Feature discovery vs. code generation. Does the agent discover which features matter, or does it just write code to compute features you specify? The first automates the thinking. The second automates the typing.
  • Time to first prediction. Can you go from a new prediction task to a usable model in seconds, hours, or weeks? Agents that require notebook iteration cycles still have latency measured in hours or days.
  • Production readiness. Can the agent's output run in a production pipeline with monitoring, retraining, and drift detection? Research prototypes and analytics-only tools often require a separate production engineering effort.
  • Accuracy on relational benchmarks. Look at performance on multi-table benchmarks like RelBench, not single-table Kaggle datasets. Single-table benchmarks do not test the feature discovery that matters most in enterprise data.
  • Autonomy level. How much human intervention does each prediction task require? Fully autonomous agents deliver predictions from a query. Assisted agents require iterative review and debugging of generated code.

Frequently asked questions

What is a data science agent?

A data science agent is an AI system that automates parts or all of the data science workflow, from data exploration and feature engineering to model building and prediction. The term covers a wide range of tools: code-generating copilots that write notebook code, no-code platforms that let business users build simple models, and foundation model agents that deliver predictions directly from raw relational data without any manual pipeline.

How do code-generating agents differ from foundation model agents?

Code-generating agents like Databricks Genie, Sphinx, and Google DS-STAR automate the workflow by writing Python/SQL code for you. They still operate on flat tables and require someone to define the right joins and features. Foundation model agents like Kumo automate the prediction itself by understanding relational data structure natively. The distinction is automating the steps vs. automating the outcome.

Can a no-code prediction platform replace a data scientist?

For simple, single-table prediction tasks (churn on a flat CSV, lead scoring on a CRM export), no-code platforms like Julius AI and Akkio can deliver usable models without a data scientist. But they cannot handle multi-table relational data, complex feature engineering, or production-grade pipelines. For enterprise use cases with relational databases, they are limited to exploratory analytics.

What is the AI agents market size in 2025?

The AI agents market was valued at $7.1 billion in 2025 and is projected to reach $54.8 billion by 2032, representing a 33.9% compound annual growth rate. Within this, data science agents are one of the fastest-growing categories, with 40% of Global 2000 companies expected to use AI agents in some capacity by 2026.

Does Databricks Genie handle multi-table relational data?

Databricks Genie Code generates Python and SQL code within notebooks. It can write JOIN statements if prompted, but it does not natively understand relational data structure. The user still needs to know which tables to join and what features to compute. Genie automates the coding, not the data science reasoning about multi-table relationships.

How does Kumo's agent approach differ from all other data science agents?

Kumo's agent is built on KumoRFM, a foundation model pre-trained on relational data. Instead of generating code or building simple models on flat tables, it reads raw multi-table databases directly, discovers cross-table patterns through graph neural networks, and delivers predictions in seconds. It automates the prediction, not just the workflow. On RelBench benchmarks, this approach scores 76.71 AUROC zero-shot vs. 62-66 for traditional flat-table approaches.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.