4Binary Classification · Capacity Planning

Infrastructure Capacity Planning

“Which servers will exceed 90% CPU utilization in the next 7 days?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Which servers will exceed 90% CPU utilization in the next 7 days?

A single capacity-related outage costs $100K–$500K per hour in lost revenue and SLA penalties. Most teams provision based on peak historical usage plus a 30% buffer, wasting $2–5M annually in over-provisioned infrastructure. If you could predict which specific servers will breach capacity limits next week, you could auto-scale proactively — eliminating both outages and waste.

Quick answer

Infrastructure capacity planning predicts which servers will exceed utilization thresholds (like 90% CPU) in the next 7 days. Graph-based models connect servers to clusters, request patterns, deployment schedules, and dependent services, catching cross-cluster traffic shifts and correlated workload spikes that threshold-based alerts miss until it is too late.

Approaches compared

4 ways to solve this problem

1. Static Threshold Alerts

Set CPU/memory alerts at fixed thresholds (85%, 90%). When a server breaches the threshold, page the on-call engineer. The default approach in most monitoring tools.

Best for

Catching imminent capacity issues. Simple to set up, universally supported by monitoring platforms.

Watch out for

Reactive by definition. The alert fires when the problem is already happening. No lead time for proactive scaling. Alert fatigue from false positives in bursty workloads.

2. Trend Extrapolation (Linear/Exponential)

Fit a trend line to recent CPU utilization and extrapolate forward. If the trend hits 90% within 7 days, flag the server. Simple math, easy to implement in a cron job.

Best for

Workloads with steady, monotonic growth patterns. Good for catching gradual capacity creep over weeks.

Watch out for

Completely misses non-linear patterns: deployment-driven spikes, traffic redistribution from autoscaling neighbors, and weekly cyclical patterns. A server trending at 60% may spike to 95% on Thursday due to a scheduled deployment.

3. Time-Series Models per Server (Prophet, ARIMA)

Fit time-series models to each server's utilization metrics. Captures daily/weekly seasonality and trend. Forecast 7 days ahead and flag servers predicted to breach thresholds.

Best for

Servers with consistent, cyclical usage patterns (database servers, batch processing nodes). Good seasonality capture.

Watch out for

Treats each server independently. Cannot see that Cluster A is absorbing traffic from Cluster B due to a rollout, or that a new deployment scheduled for Thursday will spike CPU on 40 servers simultaneously. Cross-cluster dependencies are invisible.

4. KumoRFM (Graph Neural Networks on Relational Data)

Models the full infrastructure graph: servers, clusters, request patterns, deployment schedules, and service dependencies. Predicts capacity breaches by learning from cross-cluster traffic patterns and correlated workload spikes.

Best for

Microservices architectures with complex service dependencies, auto-scaling clusters, and scheduled deployments.

Watch out for

Requires metric data at server or container level with timestamps. If your monitoring only tracks cluster-level aggregates, the per-server predictions will be less precise.

Key metric: Graph-based capacity models predict breaches 5-7 days ahead vs 1-2 days for per-server models, preventing $100K-$500K/hour outages while reclaiming $2-5M annually in over-provisioned infrastructure.

Why relational data changes the answer

Server SRV-401 currently runs at 72.4% CPU. A threshold alert will not fire until it hits 85%. A trend extrapolation model might predict it reaches 90% in 12 days. But the relational graph reveals a different timeline: SRV-401's cluster (prod-api-east) has a peer average of 78% CPU. A deployment is scheduled for Thursday that historically spikes CPU by 15% for 6 hours. And SRV-510 in the prod-ml-west cluster is at 81.9% with a request growth rate of +18%/week, which means traffic rebalancing from west to east is likely within 5 days.

These cross-entity signals change the prediction from '12 days to breach' to '4-5 days to breach.' The deployment schedule lives in a DEPLOYMENTS table. Cluster peer utilization requires aggregating across the SERVERS table grouped by cluster. Traffic rebalancing patterns emerge from the service dependency graph. A flat model that forecasts SRV-401 in isolation misses all of this. Graph neural networks learn from these cross-cluster, cross-service dependencies automatically. In production environments, graph-based capacity models predict breaches 5-7 days earlier than per-server time-series models, which is the difference between proactive auto-scaling and a 3am page.

Predicting server capacity with a per-server model is like predicting traffic congestion by looking at one road segment. A graph model sees the entire road network: if the highway on-ramp is backed up, this surface road will overflow in 20 minutes. If construction closes a parallel route tomorrow, traffic here will double. The individual road's history is useful, but the network view is what enables proactive routing.

How KumoRFM solves this

Relational intelligence for every forecast

Kumo models the full infrastructure graph — servers connected to clusters, request patterns, deployment schedules, and dependent services. A traditional threshold alert fires when CPU is already at 85%. Kumo predicts which servers will breach 90% seven days from now by learning from cross-cluster traffic patterns, deployment cadences, and correlated workload spikes. Server SRV-401 may look fine today, but Kumo sees that its cluster is absorbing traffic from a scaling neighbor and a new deployment is scheduled for Thursday.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

SERVERS

server_id	cluster	instance_type	region
SRV-401	prod-api-east	m5.2xlarge	us-east-1
SRV-402	prod-api-east	m5.2xlarge	us-east-1
SRV-510	prod-ml-west	p3.8xlarge	us-west-2

USAGE_METRICS

metric_id	server_id	cpu_percent	memory_percent	timestamp
M-80001	SRV-401	72.4	61.2	2025-09-15 14:00
M-80002	SRV-402	45.1	38.7	2025-09-15 14:00
M-80003	SRV-510	81.9	74.3	2025-09-15 14:00

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT MAX(USAGE_METRICS.CPU_PERCENT, 0, 7, days) > 90
FOR EACH SERVERS.SERVER_ID

Prediction output

Every entity gets a score, updated continuously

SERVER_ID	TIMESTAMP	TARGET_PRED	True_PROB
SRV-401	2025-09-22	True	0.92
SRV-402	2025-09-22	False	0.15
SRV-510	2025-09-22	True	0.87

Understand why

Every prediction includes feature attributions — no black boxes

Server SRV-401 (prod-api-east)

Predicted: 92% probability of exceeding 90% CPU in 7 days

Top contributing features

CPU trend (7d slope)

+4.2%/day

32% attribution

Memory-CPU correlation

0.89

23% attribution

Cluster load (peer servers)

78% avg

20% attribution

Request growth rate

+18%/week

15% attribution

Scheduled deployment

Thursday

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about infrastructure capacity planning

How far ahead can you predict server capacity issues?

Graph-based models reliably predict capacity breaches 5-7 days ahead, which is enough time for proactive auto-scaling, capacity procurement, or workload redistribution. Per-server models typically give 1-2 days of warning. The graph advantage comes from seeing cross-cluster traffic shifts and deployment schedules that affect multiple servers simultaneously.

What is the cost of a capacity-related outage?

Industry benchmarks put the cost at $100K-$500K per hour for customer-facing services, including lost revenue, SLA penalties, and incident response overhead. For fintech and e-commerce companies during peak periods, the cost can exceed $1M per hour. Prevention is always cheaper than recovery.

Can AI capacity planning reduce infrastructure costs?

Yes. Most teams over-provision by 30-50% as a buffer against uncertainty. Accurate 7-day predictions let you reduce that buffer to 10-15%, reclaiming $2-5M annually in over-provisioned infrastructure. The savings come from right-sizing instance types, terminating unused reservations, and scheduling scale-downs during predicted low-utilization windows.

How do deployments affect capacity predictions?

Deployments are one of the strongest short-term capacity signals. A new code release may increase memory usage by 20% or trigger a CPU spike during cache warming. Graph models learn deployment-to-capacity impact patterns from historical data, predicting the magnitude and duration of deployment-related spikes before they happen.

Bottom line: Prevent capacity outages 7 days before they happen and reclaim $2–5M annually in over-provisioned infrastructure.

Related use cases

Explore more forecasting use cases

Use Case #3Workforce Planning & Staffing OptimizationLearn more

Use Case #5Seasonal Trend PredictionLearn more

Use Case #6Marketing Budget AllocationLearn more

Previous#3 Workforce Planning & Staffing Optimization

Next#5 Seasonal Trend Prediction

Topics covered

capacity planning AIserver utilization predictioninfrastructure forecastingCPU prediction machine learningauto-scaling predictionKumoRFMrelational deep learningpredictive query languagecloud capacity planningproactive scaling AIinfrastructure optimizationoutage prevention prediction

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free