Every VP of Data Science faces this decision: build ML prediction models in-house or buy a platform that delivers them. The standard analysis compares software license costs to team salaries and picks the cheaper option. That analysis is wrong because it misses the four cost categories that actually dominate: engineering time, opportunity cost, maintenance burden, and scaling economics.
This guide provides the complete cost framework with real numbers from enterprise deployments. No vendor-specific pricing. Just the structural economics of building versus buying ML predictions.
tco_build_vs_buy_summary
| Metric | Build (Custom ML) | Buy (Foundation Model) | Difference |
|---|---|---|---|
| 1 model (3-year TCO) | $480K | $280K | Buy saves 42% |
| 5 models (3-year TCO) | $2.1M | $350K | Buy saves 83% |
| 10 models (3-year TCO) | $3.6M | $420K | Buy saves 88% |
| Time to first prediction | 3-6 months | Minutes | 100-1000x faster |
| Team required per model | 2-3 data scientists | SQL-literate analyst | Lower hiring bar |
| Annual maintenance | 20-30% of build cost | Included in platform | Zero incremental |
The cost gap widens with every additional prediction task. At 10 tasks, building costs 8.5x more than buying.
The cost of building
Building a custom ML prediction model has five cost components. Most organizations account for the first two and underestimate the last three by 2x to 5x.
1. Team cost
A production ML model requires a team of 2 to 3 data scientists for 3 to 6 months. At $200K to $300K fully loaded cost per data scientist (salary, benefits, equipment, management overhead), the labor cost per model is:
- Small model (2 people, 3 months): $100K to $150K
- Medium model (3 people, 4 months): $200K to $300K
- Complex model (3 people, 6 months): $300K to $450K
These numbers assume you already have the team. Recruiting data scientists takes 3 to 6 months in competitive markets and costs $30K to $50K per hire in recruiter fees. If you need to build the team first, add 6 months and $100K to $150K to the timeline.
2. Infrastructure cost
ML training and serving requires compute infrastructure: GPU instances for training, CPU/GPU instances for serving, data storage, and experiment tracking tools.
- Training compute: $2K to $10K per model training run
- Serving infrastructure: $3K to $15K per month per model
- Data storage and processing: $1K to $5K per month
- MLOps tooling (experiment tracking, model registry): $1K to $5K per month
Annual infrastructure cost per model: $60K to $300K depending on scale and serving requirements.
3. Feature engineering cost (the hidden dominant)
Feature engineering consumes 80% of data science time. The Stanford RelBench study measured this precisely: 12.3 hours and 878 lines of code per prediction task for experienced data scientists working on relational databases.
For a model that uses 200 features derived from 5 to 10 tables, feature engineering takes 4 to 8 weeks of a data scientist's time. At $150K to $225K annualized cost for that period, feature engineering alone costs $12K to $35K per model. But the real cost is not the time spent. It is the time wasted: the Stanford study showed that data scientists explore fewer than 5% of the possible feature space, missing the multi-hop and temporal patterns that carry the strongest signal.
4. Maintenance cost (the compounding hidden cost)
Models in production degrade. Data distributions shift, business rules change, upstream data pipelines break. Maintaining a production model costs 20% to 30% of the initial build cost per year:
- Retraining every 3 to 6 months: 2 to 4 weeks of data scientist time
- Feature pipeline monitoring and repair: 5 to 10 hours per month
- Data quality investigations: 2 to 5 hours per incident, 1 to 3 incidents per month
- Infrastructure updates and dependency management: ongoing
Over 3 years, maintenance costs exceed the initial build cost. A $300K model costs $180K to $270K to maintain over 3 years, for a total lifetime cost of $480K to $570K.
5. Opportunity cost (the largest hidden cost)
Every month without a working prediction model is revenue left on the table. If a churn model can retain 15% more customers and your annual churn costs $24M in lost revenue, each month of delay costs $300K in preventable losses.
A 5-month build timeline means $1.5M in opportunity cost. A foundation model that delivers predictions in the first week eliminates nearly all of that.
build_timeline_vs_buy_timeline (churn model)
| Week | Build (Custom ML) | Buy (Foundation Model) | Revenue Impact |
|---|---|---|---|
| Week 1 | Kick off, data access requests | Connect database, run zero-shot | Buy: first predictions live |
| Week 2-4 | Data exploration, schema mapping | Validate predictions, fine-tune | Buy: model in production |
| Week 5-8 | Feature engineering (SQL joins) | Monitoring, iterate on thresholds | $300K/mo retained (buy) |
| Week 9-12 | More feature engineering, iteration | Running, retaining customers | $300K/mo retained (buy) |
| Week 13-16 | Model training, validation | Running | $300K/mo retained (buy) |
| Week 17-20 | Deployment, integration | Running | $300K/mo retained (buy) |
| Week 21-24 | Model goes live | Running (5 months ahead) | Build: first predictions live |
The foundation model is in production by week 2. The custom build takes until week 21-24. At $300K/month in retained revenue from churn reduction, the 5-month gap represents $1.5M in lost impact.
The cost of buying
"Buying" in this context means using a foundation model platform that delivers predictions on your relational data without custom model building.
Platform cost
Foundation model platforms typically price on one of two models:
- Subscription: $50K to $200K per year for a given data volume and number of prediction tasks. Includes zero-shot predictions, fine-tuning, and API access.
- Usage-based: Pay per prediction or per compute hour. Costs scale with usage but start lower.
Integration cost
Connecting the platform to your data warehouse and integrating predictions into your workflows. This is primarily engineering time:
- Database connection: 1 to 2 days
- First prediction task validation: 1 to 2 weeks
- Workflow integration (CRM, marketing automation, app): 2 to 4 weeks
Total integration cost: $20K to $60K in team time, a one-time investment that applies to all subsequent prediction tasks.
Marginal cost per additional task
This is where buying structurally differs from building. With a foundation model, each new prediction task is a new PQL query. No new feature engineering, no new model training, no new infrastructure. The marginal cost per additional task is:
- Writing and validating the PQL query: 1 to 3 days of analyst time
- Optional fine-tuning: 2 to 8 hours of compute
- Integration: often reuses existing workflows
Marginal cost per task: $2K to $10K. Compare this to $150K to $500K per task when building.
Build: cost per model
- Team: $150K-450K (2-3 data scientists, 3-6 months)
- Infrastructure: $60K-300K/year
- Feature engineering: 80% of team time
- Maintenance: 20-30% of build cost per year
- Opportunity cost: $300K+ per month of delay
Buy: cost per model
- Platform: $50K-200K/year (covers all tasks)
- Integration: $20K-60K one-time
- Feature engineering: $0 (eliminated)
- Maintenance: included in platform
- Opportunity cost: near-zero (minutes to prediction)
hidden_costs_of_building
| Cost Category | Year 1 | Year 2 | Year 3 | 3-Year Total |
|---|---|---|---|---|
| Team (2-3 DS x 4 months) | $200K-300K | -- | -- | $200K-300K |
| Infrastructure | $60K-120K | $60K-120K | $60K-120K | $180K-360K |
| Feature Engineering (labor) | $12K-35K | $6K-15K | $6K-15K | $24K-65K |
| Maintenance (20-30%/yr) | -- | $60K-90K | $60K-90K | $120K-180K |
| Opportunity Cost (5mo delay) | $1.5M | -- | -- | $1.5M |
| Recruiting (if needed) | $100K-150K | -- | -- | $100K-150K |
Highlighted rows are costs most build-vs-buy analyses miss entirely. Opportunity cost alone can exceed the total platform cost.
The scaling math: why the gap widens
The build-vs-buy decision becomes clearer when you model the cost across multiple prediction tasks. Here is the math for a 3-year period.
Scenario: 1 prediction task
| Cost category | Build | Buy |
|---|---|---|
| Year 1 (build + deploy) | $300K | $120K |
| Year 2 (maintain + iterate) | $90K | $80K |
| Year 3 (maintain + iterate) | $90K | $80K |
| 3-year total | $480K | $280K |
At 1 task, buying saves roughly 40%. Meaningful but not significant.
Scenario: 5 prediction tasks
| Cost category | Build | Buy |
|---|---|---|
| Year 1 (build 2-3 models) | $750K | $150K |
| Year 2 (build 2-3 more + maintain) | $900K | $100K |
| Year 3 (maintain all 5) | $450K | $100K |
| 3-year total | $2.1M | $350K |
At 5 tasks, buying is 6x cheaper. The gap comes from zero marginal feature engineering cost per additional task.
Scenario: 10 prediction tasks
| Cost category | Build | Buy |
|---|---|---|
| Year 1 | $1.2M | $180K |
| Year 2 | $1.5M | $120K |
| Year 3 | $900K | $120K |
| 3-year total | $3.6M | $420K |
At 10 tasks, buying is 8.5x cheaper. And the build number assumes you can even hire enough data scientists to staff 10 concurrent model builds, which most organizations cannot.
The accuracy dimension
Cost is only half the equation. What about accuracy?
On the RelBench benchmark, which evaluates all approaches on the same data with the same temporal splits:
- Manual ML (LightGBM + feature engineering): 62.44 average AUROC
- Foundation model zero-shot: 76.71 average AUROC
- Foundation model fine-tuned: 81.14 average AUROC
The foundation model is not just cheaper. It is more accurate, because it explores the full relational feature space that human engineers cannot enumerate. The cost advantage and accuracy advantage point in the same direction, which is unusual in enterprise software decisions.
A simple cost framework
Use this framework to calculate the build-vs-buy economics for your specific situation:
Step 1: Count your prediction tasks
List every prediction your organization needs or wants: churn, fraud, demand, recommendations, lead scoring, lifetime value, next-best-action, credit risk. Include both existing models and models you have not built yet because they are too expensive.
Step 2: Calculate build cost per task
(Number of data scientists x monthly fully loaded cost x months to build) + (monthly infrastructure cost x 36 months) + (30% x build cost x 3 years maintenance). Multiply by number of tasks.
Step 3: Calculate buy cost
(Annual platform cost x 3 years) + (one-time integration cost) + (marginal cost per additional task x number of tasks).
Step 4: Add opportunity cost to the build scenario
For each task, estimate the monthly revenue impact of the prediction (retained revenue from churn, prevented fraud losses, incremental conversion revenue). Multiply by the months of delay in the build scenario compared to the buy scenario. Add this to the build cost.
Step 5: Compare
For most enterprises with 5 or more prediction tasks on relational data, buying is 3x to 10x cheaper with equal or better accuracy. The rare exception: organizations with single high-stakes models where regulatory requirements demand full code-level auditability of every model component.
When building makes sense
- 1-2 high-stakes models justifying deep investment
- Genuinely unique data (proprietary sensors, classified)
- Regulatory requirement for full code-level auditability
- ML system is a core competitive differentiator
- Established team with excess capacity
When buying makes sense
- 3+ prediction tasks on relational data
- Time to value matters (every month is $300K+ in lost impact)
- Data science team is at capacity or hard to hire
- Predictions are a means to an end, not the product
- Need to test many predictions before committing
The real question
Build-vs-buy is often framed as a technology decision. It is actually a resource allocation decision. Every data scientist hour spent on feature engineering is an hour not spent on interpreting predictions, building decision systems, or identifying new use cases.
The organizations getting the most value from ML predictions are not the ones with the biggest data science teams. They are the ones that have eliminated the repetitive engineering work (feature engineering, pipeline maintenance, model retraining) and redirected their talent toward the strategic work: deciding what to predict, evaluating what the predictions mean, and building the organizational systems that turn predictions into revenue.
If your data scientists spend 80% of their time on feature engineering and pipeline maintenance, the build-vs-buy question is already answered. The only remaining question is when.
PQL Query
PREDICT churn_30d FOR EACH customers.customer_id WHERE customers.arr > 50000
This query delivers enterprise churn predictions in seconds. Building the equivalent custom model: 3-6 months, $300K-450K, team of 3 data scientists. The PQL query costs effectively zero marginal.
Output
| customer_id | churn_risk | arr | retention_action |
|---|---|---|---|
| ENT-1001 | 0.82 | $120K | Executive outreach recommended |
| ENT-1002 | 0.15 | $95K | No action needed |
| ENT-1003 | 0.67 | $210K | CSM escalation triggered |
| ENT-1004 | 0.04 | $78K | Expansion opportunity detected |
KumoRFM was built by the team behind the ML systems at Pinterest, Airbnb, and LinkedIn: Vanja Josifovski (CEO, former CTO at Airbnb and Pinterest), Jure Leskovec (Chief Scientist, Stanford professor, co-creator of GraphSAGE), and Hema Raghavan (Head of Engineering, former Sr. Director at LinkedIn). Backed by Sequoia Capital.