Infrastructure Capacity Planning
“Which servers will exceed 90% CPU utilization in the next 7 days?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which servers will exceed 90% CPU utilization in the next 7 days?
A single capacity-related outage costs $100K–$500K per hour in lost revenue and SLA penalties. Most teams provision based on peak historical usage plus a 30% buffer, wasting $2–5M annually in over-provisioned infrastructure. If you could predict which specific servers will breach capacity limits next week, you could auto-scale proactively — eliminating both outages and waste.
Quick answer
Infrastructure capacity planning predicts which servers will exceed utilization thresholds (like 90% CPU) in the next 7 days. Graph-based models connect servers to clusters, request patterns, deployment schedules, and dependent services, catching cross-cluster traffic shifts and correlated workload spikes that threshold-based alerts miss until it is too late.
Approaches compared
4 ways to solve this problem
1. Static Threshold Alerts
Set CPU/memory alerts at fixed thresholds (85%, 90%). When a server breaches the threshold, page the on-call engineer. The default approach in most monitoring tools.
Best for
Catching imminent capacity issues. Simple to set up, universally supported by monitoring platforms.
Watch out for
Reactive by definition. The alert fires when the problem is already happening. No lead time for proactive scaling. Alert fatigue from false positives in bursty workloads.
2. Trend Extrapolation (Linear/Exponential)
Fit a trend line to recent CPU utilization and extrapolate forward. If the trend hits 90% within 7 days, flag the server. Simple math, easy to implement in a cron job.
Best for
Workloads with steady, monotonic growth patterns. Good for catching gradual capacity creep over weeks.
Watch out for
Completely misses non-linear patterns: deployment-driven spikes, traffic redistribution from autoscaling neighbors, and weekly cyclical patterns. A server trending at 60% may spike to 95% on Thursday due to a scheduled deployment.
3. Time-Series Models per Server (Prophet, ARIMA)
Fit time-series models to each server's utilization metrics. Captures daily/weekly seasonality and trend. Forecast 7 days ahead and flag servers predicted to breach thresholds.
Best for
Servers with consistent, cyclical usage patterns (database servers, batch processing nodes). Good seasonality capture.
Watch out for
Treats each server independently. Cannot see that Cluster A is absorbing traffic from Cluster B due to a rollout, or that a new deployment scheduled for Thursday will spike CPU on 40 servers simultaneously. Cross-cluster dependencies are invisible.
4. KumoRFM (Graph Neural Networks on Relational Data)
Models the full infrastructure graph: servers, clusters, request patterns, deployment schedules, and service dependencies. Predicts capacity breaches by learning from cross-cluster traffic patterns and correlated workload spikes.
Best for
Microservices architectures with complex service dependencies, auto-scaling clusters, and scheduled deployments.
Watch out for
Requires metric data at server or container level with timestamps. If your monitoring only tracks cluster-level aggregates, the per-server predictions will be less precise.
Key metric: Graph-based capacity models predict breaches 5-7 days ahead vs 1-2 days for per-server models, preventing $100K-$500K/hour outages while reclaiming $2-5M annually in over-provisioned infrastructure.
Why relational data changes the answer
Server SRV-401 currently runs at 72.4% CPU. A threshold alert will not fire until it hits 85%. A trend extrapolation model might predict it reaches 90% in 12 days. But the relational graph reveals a different timeline: SRV-401's cluster (prod-api-east) has a peer average of 78% CPU. A deployment is scheduled for Thursday that historically spikes CPU by 15% for 6 hours. And SRV-510 in the prod-ml-west cluster is at 81.9% with a request growth rate of +18%/week, which means traffic rebalancing from west to east is likely within 5 days.
These cross-entity signals change the prediction from '12 days to breach' to '4-5 days to breach.' The deployment schedule lives in a DEPLOYMENTS table. Cluster peer utilization requires aggregating across the SERVERS table grouped by cluster. Traffic rebalancing patterns emerge from the service dependency graph. A flat model that forecasts SRV-401 in isolation misses all of this. Graph neural networks learn from these cross-cluster, cross-service dependencies automatically. In production environments, graph-based capacity models predict breaches 5-7 days earlier than per-server time-series models, which is the difference between proactive auto-scaling and a 3am page.
Predicting server capacity with a per-server model is like predicting traffic congestion by looking at one road segment. A graph model sees the entire road network: if the highway on-ramp is backed up, this surface road will overflow in 20 minutes. If construction closes a parallel route tomorrow, traffic here will double. The individual road's history is useful, but the network view is what enables proactive routing.
How KumoRFM solves this
Relational intelligence for every forecast
Kumo models the full infrastructure graph — servers connected to clusters, request patterns, deployment schedules, and dependent services. A traditional threshold alert fires when CPU is already at 85%. Kumo predicts which servers will breach 90% seven days from now by learning from cross-cluster traffic patterns, deployment cadences, and correlated workload spikes. Server SRV-401 may look fine today, but Kumo sees that its cluster is absorbing traffic from a scaling neighbor and a new deployment is scheduled for Thursday.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
SERVERS
| server_id | cluster | instance_type | region |
|---|---|---|---|
| SRV-401 | prod-api-east | m5.2xlarge | us-east-1 |
| SRV-402 | prod-api-east | m5.2xlarge | us-east-1 |
| SRV-510 | prod-ml-west | p3.8xlarge | us-west-2 |
USAGE_METRICS
| metric_id | server_id | cpu_percent | memory_percent | timestamp |
|---|---|---|---|---|
| M-80001 | SRV-401 | 72.4 | 61.2 | 2025-09-15 14:00 |
| M-80002 | SRV-402 | 45.1 | 38.7 | 2025-09-15 14:00 |
| M-80003 | SRV-510 | 81.9 | 74.3 | 2025-09-15 14:00 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT MAX(USAGE_METRICS.CPU_PERCENT, 0, 7, days) > 90 FOR EACH SERVERS.SERVER_ID
Prediction output
Every entity gets a score, updated continuously
| SERVER_ID | TIMESTAMP | TARGET_PRED | True_PROB |
|---|---|---|---|
| SRV-401 | 2025-09-22 | True | 0.92 |
| SRV-402 | 2025-09-22 | False | 0.15 |
| SRV-510 | 2025-09-22 | True | 0.87 |
Understand why
Every prediction includes feature attributions — no black boxes
Server SRV-401 (prod-api-east)
Predicted: 92% probability of exceeding 90% CPU in 7 days
Top contributing features
CPU trend (7d slope)
+4.2%/day
32% attribution
Memory-CPU correlation
0.89
23% attribution
Cluster load (peer servers)
78% avg
20% attribution
Request growth rate
+18%/week
15% attribution
Scheduled deployment
Thursday
10% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about infrastructure capacity planning
How far ahead can you predict server capacity issues?
Graph-based models reliably predict capacity breaches 5-7 days ahead, which is enough time for proactive auto-scaling, capacity procurement, or workload redistribution. Per-server models typically give 1-2 days of warning. The graph advantage comes from seeing cross-cluster traffic shifts and deployment schedules that affect multiple servers simultaneously.
What is the cost of a capacity-related outage?
Industry benchmarks put the cost at $100K-$500K per hour for customer-facing services, including lost revenue, SLA penalties, and incident response overhead. For fintech and e-commerce companies during peak periods, the cost can exceed $1M per hour. Prevention is always cheaper than recovery.
Can AI capacity planning reduce infrastructure costs?
Yes. Most teams over-provision by 30-50% as a buffer against uncertainty. Accurate 7-day predictions let you reduce that buffer to 10-15%, reclaiming $2-5M annually in over-provisioned infrastructure. The savings come from right-sizing instance types, terminating unused reservations, and scheduling scale-downs during predicted low-utilization windows.
How do deployments affect capacity predictions?
Deployments are one of the strongest short-term capacity signals. A new code release may increase memory usage by 20% or trigger a CPU spike during cache warming. Graph models learn deployment-to-capacity impact patterns from historical data, predicting the magnitude and duration of deployment-related spikes before they happen.
Bottom line: Prevent capacity outages 7 days before they happen and reclaim $2–5M annually in over-provisioned infrastructure.
Related use cases
Explore more forecasting use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




