How do code-generating agents differ from foundation model agents?

Code-generating agents like Databricks Genie, Sphinx, and Google DS-STAR automate the workflow by writing Python/SQL code for you. They still operate on flat tables and require someone to define the right joins and features. Foundation model agents like Kumo automate the prediction itself by understanding relational data structure natively. The distinction is automating the steps vs. automating the outcome.

Can a no-code prediction platform replace a data scientist?

For simple, single-table prediction tasks (churn on a flat CSV, lead scoring on a CRM export), no-code platforms like Julius AI and Akkio can deliver usable models without a data scientist. But they cannot handle multi-table relational data, complex feature engineering, or production-grade pipelines. For enterprise use cases with relational databases, they are limited to exploratory analytics.

What is the AI agents market size in 2025?

The AI agents market was valued at $7.1 billion in 2025 and is projected to reach $54.8 billion by 2032, representing a 33.9% compound annual growth rate. Within this, data science agents are one of the fastest-growing categories, with 40% of Global 2000 companies expected to use AI agents in some capacity by 2026.

Does Databricks Genie handle multi-table relational data?

Databricks Genie Code generates Python and SQL code within notebooks. It can write JOIN statements if prompted, but it does not natively understand relational data structure. The user still needs to know which tables to join and what features to compute. Genie automates the coding, not the data science reasoning about multi-table relationships.

How does Kumo's agent approach differ from all other data science agents?

Kumo's agent is built on KumoRFM, a foundation model pre-trained on relational data. Instead of generating code or building simple models on flat tables, it reads raw multi-table databases directly, discovers cross-table patterns through graph neural networks, and delivers predictions in seconds. It automates the prediction, not just the workflow. On RelBench benchmarks, this approach scores 76.71 AUROC zero-shot vs. 62-66 for traditional flat-table approaches.

What is a Data Science Agent? The 2026 Landscape | Kumo.ai

Q: What is a data science agent?

A data science agent is an AI system that automates parts or all of the data science workflow, from data exploration and feature engineering to model building and prediction. The term covers a wide range of tools: code-generating copilots that write notebook code, no-code platforms that let business users build simple models, and foundation model agents that deliver predictions directly from raw relational data without any manual pipeline.

The term "data science agent" entered mainstream vocabulary in 2025 when Databricks, Google, and a wave of startups all shipped AI systems that could write data science code, build models, or deliver predictions with minimal human intervention. The AI agents market hit $7.1 billion in 2025 and is projected to reach $54.8 billion by 2032 at a 33.9% CAGR. Forty percent of Global 2000 companies are expected to use AI agents by 2026.

But "data science agent" means very different things depending on who is selling it. Some agents write Python code in a notebook. Some build drag-and-drop models from a CSV. Some read your relational database and deliver predictions directly. These are not incremental differences. They represent fundamentally different philosophies about what should be automated.

The headline result: SAP SALT benchmark

The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approach	accuracy	what_it_means
LLM + AutoML	63%	Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost	75%	Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)	91%	No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.

KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

Three types of data science agents

The data science agent landscape breaks into three distinct categories, each automating a different part of the workflow:

Code-generating agents that write Python, SQL, and notebook code for you (Databricks Genie, Sphinx, Google DS-STAR). These target data scientists and automate the typing, not the thinking.
No-code prediction platforms that let business users build models without writing code (Julius AI, Akkio, Obviously AI). These target analysts and automate simple, single-table predictions.
Foundation model agents that understand relational data structure and deliver predictions directly (Kumo). These target the prediction itself and automate the entire pipeline from raw tables to output.

data_science_agent_landscape_2026

agent	approach	data_types	autonomous	multi_table	production_ready
Kumo AI	Foundation model + agent	Relational (multi-table)	Full pipeline	Yes (native)	Yes
Databricks Genie	Code generation	Flat tables/notebooks	Assisted	Manual joins	Yes
Sphinx	Jupyter copilot	Flat tables/notebooks	Assisted	Manual joins	Early
Google DS-STAR	Multi-agent framework	Flat tables	Autonomous	Manual joins	Research
Julius AI	Chat-to-analysis	CSV/connections	Assisted	No	Yes (analytics only)
Akkio	No-code drag-drop	CSV upload	Assisted	No	Yes (simple models)

Highlighted: Kumo is the only agent that natively understands multi-table relational data. All other agents operate on flat tables, requiring manual joins to combine information across tables.

Code-generating agents

Code-generating agents are the most visible category in 2026. They sit inside notebooks and IDEs, watch what you are doing, and write Python or SQL code to help. The pitch: a senior data scientist's productivity, available to everyone.

Databricks Genie Code

Databricks Genie Code is integrated into Databricks notebooks. It claims a 2x success rate over leading coding agents on data science tasks and a 60-80% reduction in processing time. It generates Python and SQL, executes it in the notebook, and iterates based on results. For teams already on Databricks, it is the most frictionless entry point.

Sphinx

Sphinx raised $9.5 million in seed funding from Lightspeed and Bessemer Venture Partners. It is Jupyter-native, meaning it works inside the existing notebook workflow that most data scientists already use. It generates code cells, explains its reasoning, and can iterate on errors.

Google DS-STAR

Google's DS-STAR is a multi-agent framework where specialized agents plan, code, and verify data science tasks. It represents the most autonomous approach in the code-generating category, with agents that can decompose complex tasks into subtasks and verify their own outputs. It remains a research project for now.

The limitation of code-generating agents

Code-generating agents make data scientists faster at writing code. But the bottleneck in enterprise data science is not typing speed. It is knowing which tables to join, which features to compute, and which temporal patterns matter. A code-generating agent can write a LEFT JOIN in seconds, but it cannot tell you whether that join captures the right signal for your prediction task.

These agents still operate on flat tables. They generate code that processes one table at a time, and they rely on the human to specify the multi-table logic. The feature engineering bottleneck (12.3 hours, 878 lines of code per task on RelBench) becomes faster to type but no less intellectually demanding.

No-code prediction platforms

No-code platforms take the opposite approach: instead of making data scientists faster, they try to eliminate the need for data scientists entirely by giving business users point-and-click model building.

Julius AI

Julius AI offers a chat interface where users describe what they want to predict, upload a CSV or connect to a data source, and get a model back. Pricing ranges from free to $70/month, and it holds SOC 2 Type II certification. It works well for exploratory analytics and simple predictions on a single dataset.

Akkio

Akkio starts at $49/month and provides a no-code drag-and-drop interface for building predictive models. Users upload a CSV, select a target column, and Akkio trains and deploys a model. It is designed for marketing teams, small businesses, and agencies that need quick predictions without a data science team.

The limitation of no-code platforms

No-code platforms work on a single flat table. They cannot read a relational database with customers, orders, products, and support tickets linked by foreign keys. They cannot discover that a customer's churn risk depends on the return rate of products they purchased, because they never see the products table.

For enterprise use cases where the predictive signal lives in the relationships between tables, not within a single table, no-code platforms miss the most important patterns. They are useful for quick analytics on flat exports, but they cannot replace a production ML pipeline.

Foundation model agents

Foundation model agents represent a fundamentally different approach. Instead of generating code or building simple models, they understand relational data structure natively and deliver predictions directly.

Kumo's agent is built on KumoRFM, a foundation model pre-trained on thousands of diverse relational databases. It represents your database as a temporal heterogeneous graph, where each row becomes a node, each foreign key becomes an edge, and timestamps are preserved as temporal attributes. A graph transformer processes this structure, learning which cross-table patterns are predictive.

The result: you describe what you want to predict in a single query, and the model reads your raw relational tables, discovers the relevant features, and returns predictions. No code, no flat tables, no manual feature engineering.

Code-generating agent workflow

Agent writes SQL to join tables into flat table
Agent writes Python to compute features
Agent writes code to train and tune models
Human reviews, iterates, and debugs each step
Output: a model trained on a manually-defined feature table

Foundation model agent workflow

User describes prediction task in one query
Agent reads raw relational tables directly
Model discovers cross-table features automatically
Predictions returned in seconds, no iteration needed
Output: predictions from the full relational data structure

PQL Query

PREDICT churn_90d
FOR EACH customers.customer_id
WHERE customers.segment = 'enterprise'

One PQL query replaces the entire code-generating agent workflow. No notebook code, no flat table construction, no model selection. The foundation model reads raw relational tables (customers, orders, support_tickets, product_usage) and delivers predictions in 1 second.

Output

customer_id	churn_prob	top_signal	time_to_predict
C-4401	0.87	Support tickets up 3x, product usage down 40%	1 sec
C-4402	0.12	Expanding seats, high feature adoption	1 sec
C-4403	0.64	Contract renewal in 30d, declining engagement	1 sec
C-4404	0.03	Multi-department usage, recent expansion	1 sec

AUROC (Area Under the Receiver Operating Characteristic curve) measures how well a model distinguishes between positive and negative outcomes. An AUROC of 50 means random guessing, 100 means perfect prediction. Moving from 65 to 77 AUROC means the model correctly ranks a true positive above a true negative 77% of the time instead of 65%.

time_and_accuracy_comparison

approach	time_to_prediction	AUROC	multi_table_support	human_hours_per_task
Code-generating agent + flat table	Hours to days	~62-66	Manual joins (agent-written)	4-8
No-code platform + CSV	Minutes	~55-62	None	0.5-1
KumoRFM zero-shot	1 second	76.71	Native (automatic)	0.001
KumoRFM fine-tuned	Minutes (tuning) + 1 second	81.14	Native (automatic)	0.1

Highlighted: KumoRFM delivers higher accuracy in less time because it reads relational data directly. The 10+ AUROC point gap over code-generating agents reflects the value of understanding data structure, not just writing code faster.

What to look for in a data science agent

Not all data science agents solve the same problem. When evaluating agents for enterprise use, these are the criteria that separate tools that speed up the workflow from tools that transform it:

Multi-table relational data support. Does the agent read multiple related tables natively, or does it require someone to flatten the data first? If your predictive signal lives in relationships between tables (and in enterprise data, it almost always does), this is the most important criterion.
Feature discovery vs. code generation. Does the agent discover which features matter, or does it just write code to compute features you specify? The first automates the thinking. The second automates the typing.
Time to first prediction. Can you go from a new prediction task to a usable model in seconds, hours, or weeks? Agents that require notebook iteration cycles still have latency measured in hours or days.
Production readiness. Can the agent's output run in a production pipeline with monitoring, retraining, and drift detection? Research prototypes and analytics-only tools often require a separate production engineering effort.
Accuracy on relational benchmarks. Look at performance on multi-table benchmarks like RelBench, not single-table Kaggle datasets. Single-table benchmarks do not test the feature discovery that matters most in enterprise data.
Autonomy level. How much human intervention does each prediction task require? Fully autonomous agents deliver predictions from a query. Assisted agents require iterative review and debugging of generated code.

Key Takeaways

1Data science agents break into three categories: code-generating agents (Databricks Genie, Sphinx, Google DS-STAR), no-code platforms (Julius AI, Akkio), and foundation model agents (Kumo). They solve fundamentally different problems.
2Code-generating agents automate the workflow (writing code faster) but still operate on flat tables. The feature engineering bottleneck becomes faster to type but no less intellectually demanding. They cannot discover multi-table patterns on their own.
3No-code platforms work for single-table analytics but miss the relational patterns that drive enterprise predictions. They are useful for quick exploration, not production ML on relational databases.
4Foundation model agents automate the prediction itself by understanding relational data structure natively. KumoRFM scores 76.71 AUROC zero-shot vs. 62-66 for flat-table approaches, a gap that reflects data understanding, not model tuning.
5The critical evaluation question: does the agent automate the workflow or the prediction? Automating the workflow speeds up the bottleneck. Automating the prediction eliminates it. At scale, the difference is 85% cost reduction and 10+ points of accuracy.

What is a Data Science Agent? The 2026 Landscape