Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
4Binary Classification · Capacity Planning

Infrastructure Capacity Planning

Which servers will exceed 90% CPU utilization in the next 7 days?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Which servers will exceed 90% CPU utilization in the next 7 days?

A single capacity-related outage costs $100K–$500K per hour in lost revenue and SLA penalties. Most teams provision based on peak historical usage plus a 30% buffer, wasting $2–5M annually in over-provisioned infrastructure. If you could predict which specific servers will breach capacity limits next week, you could auto-scale proactively — eliminating both outages and waste.

Quick answer

Infrastructure capacity planning predicts which servers will exceed utilization thresholds (like 90% CPU) in the next 7 days. Graph-based models connect servers to clusters, request patterns, deployment schedules, and dependent services, catching cross-cluster traffic shifts and correlated workload spikes that threshold-based alerts miss until it is too late.

Approaches compared

4 ways to solve this problem

1. Static Threshold Alerts

Set CPU/memory alerts at fixed thresholds (85%, 90%). When a server breaches the threshold, page the on-call engineer. The default approach in most monitoring tools.

Best for

Catching imminent capacity issues. Simple to set up, universally supported by monitoring platforms.

Watch out for

Reactive by definition. The alert fires when the problem is already happening. No lead time for proactive scaling. Alert fatigue from false positives in bursty workloads.

2. Trend Extrapolation (Linear/Exponential)

Fit a trend line to recent CPU utilization and extrapolate forward. If the trend hits 90% within 7 days, flag the server. Simple math, easy to implement in a cron job.

Best for

Workloads with steady, monotonic growth patterns. Good for catching gradual capacity creep over weeks.

Watch out for

Completely misses non-linear patterns: deployment-driven spikes, traffic redistribution from autoscaling neighbors, and weekly cyclical patterns. A server trending at 60% may spike to 95% on Thursday due to a scheduled deployment.

3. Time-Series Models per Server (Prophet, ARIMA)

Fit time-series models to each server's utilization metrics. Captures daily/weekly seasonality and trend. Forecast 7 days ahead and flag servers predicted to breach thresholds.

Best for

Servers with consistent, cyclical usage patterns (database servers, batch processing nodes). Good seasonality capture.

Watch out for

Treats each server independently. Cannot see that Cluster A is absorbing traffic from Cluster B due to a rollout, or that a new deployment scheduled for Thursday will spike CPU on 40 servers simultaneously. Cross-cluster dependencies are invisible.

4. KumoRFM (Graph Neural Networks on Relational Data)

Models the full infrastructure graph: servers, clusters, request patterns, deployment schedules, and service dependencies. Predicts capacity breaches by learning from cross-cluster traffic patterns and correlated workload spikes.

Best for

Microservices architectures with complex service dependencies, auto-scaling clusters, and scheduled deployments.

Watch out for

Requires metric data at server or container level with timestamps. If your monitoring only tracks cluster-level aggregates, the per-server predictions will be less precise.

Key metric: Graph-based capacity models predict breaches 5-7 days ahead vs 1-2 days for per-server models, preventing $100K-$500K/hour outages while reclaiming $2-5M annually in over-provisioned infrastructure.

Why relational data changes the answer

Server SRV-401 currently runs at 72.4% CPU. A threshold alert will not fire until it hits 85%. A trend extrapolation model might predict it reaches 90% in 12 days. But the relational graph reveals a different timeline: SRV-401's cluster (prod-api-east) has a peer average of 78% CPU. A deployment is scheduled for Thursday that historically spikes CPU by 15% for 6 hours. And SRV-510 in the prod-ml-west cluster is at 81.9% with a request growth rate of +18%/week, which means traffic rebalancing from west to east is likely within 5 days.

These cross-entity signals change the prediction from '12 days to breach' to '4-5 days to breach.' The deployment schedule lives in a DEPLOYMENTS table. Cluster peer utilization requires aggregating across the SERVERS table grouped by cluster. Traffic rebalancing patterns emerge from the service dependency graph. A flat model that forecasts SRV-401 in isolation misses all of this. Graph neural networks learn from these cross-cluster, cross-service dependencies automatically. In production environments, graph-based capacity models predict breaches 5-7 days earlier than per-server time-series models, which is the difference between proactive auto-scaling and a 3am page.

Predicting server capacity with a per-server model is like predicting traffic congestion by looking at one road segment. A graph model sees the entire road network: if the highway on-ramp is backed up, this surface road will overflow in 20 minutes. If construction closes a parallel route tomorrow, traffic here will double. The individual road's history is useful, but the network view is what enables proactive routing.

How KumoRFM solves this

Relational intelligence for every forecast

Kumo models the full infrastructure graph — servers connected to clusters, request patterns, deployment schedules, and dependent services. A traditional threshold alert fires when CPU is already at 85%. Kumo predicts which servers will breach 90% seven days from now by learning from cross-cluster traffic patterns, deployment cadences, and correlated workload spikes. Server SRV-401 may look fine today, but Kumo sees that its cluster is absorbing traffic from a scaling neighbor and a new deployment is scheduled for Thursday.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

SERVERS

server_idclusterinstance_typeregion
SRV-401prod-api-eastm5.2xlargeus-east-1
SRV-402prod-api-eastm5.2xlargeus-east-1
SRV-510prod-ml-westp3.8xlargeus-west-2

USAGE_METRICS

metric_idserver_idcpu_percentmemory_percenttimestamp
M-80001SRV-40172.461.22025-09-15 14:00
M-80002SRV-40245.138.72025-09-15 14:00
M-80003SRV-51081.974.32025-09-15 14:00
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT MAX(USAGE_METRICS.CPU_PERCENT, 0, 7, days) > 90
FOR EACH SERVERS.SERVER_ID
3

Prediction output

Every entity gets a score, updated continuously

SERVER_IDTIMESTAMPTARGET_PREDTrue_PROB
SRV-4012025-09-22True0.92
SRV-4022025-09-22False0.15
SRV-5102025-09-22True0.87
4

Understand why

Every prediction includes feature attributions — no black boxes

Server SRV-401 (prod-api-east)

Predicted: 92% probability of exceeding 90% CPU in 7 days

Top contributing features

CPU trend (7d slope)

+4.2%/day

32% attribution

Memory-CPU correlation

0.89

23% attribution

Cluster load (peer servers)

78% avg

20% attribution

Request growth rate

+18%/week

15% attribution

Scheduled deployment

Thursday

10% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about infrastructure capacity planning

How far ahead can you predict server capacity issues?

Graph-based models reliably predict capacity breaches 5-7 days ahead, which is enough time for proactive auto-scaling, capacity procurement, or workload redistribution. Per-server models typically give 1-2 days of warning. The graph advantage comes from seeing cross-cluster traffic shifts and deployment schedules that affect multiple servers simultaneously.

What is the cost of a capacity-related outage?

Industry benchmarks put the cost at $100K-$500K per hour for customer-facing services, including lost revenue, SLA penalties, and incident response overhead. For fintech and e-commerce companies during peak periods, the cost can exceed $1M per hour. Prevention is always cheaper than recovery.

Can AI capacity planning reduce infrastructure costs?

Yes. Most teams over-provision by 30-50% as a buffer against uncertainty. Accurate 7-day predictions let you reduce that buffer to 10-15%, reclaiming $2-5M annually in over-provisioned infrastructure. The savings come from right-sizing instance types, terminating unused reservations, and scheduling scale-downs during predicted low-utilization windows.

How do deployments affect capacity predictions?

Deployments are one of the strongest short-term capacity signals. A new code release may increase memory usage by 20% or trigger a CPU spike during cache warming. Graph models learn deployment-to-capacity impact patterns from historical data, predicting the magnitude and duration of deployment-related spikes before they happen.

Bottom line: Prevent capacity outages 7 days before they happen and reclaim $2–5M annually in over-provisioned infrastructure.

Topics covered

capacity planning AIserver utilization predictioninfrastructure forecastingCPU prediction machine learningauto-scaling predictionKumoRFMrelational deep learningpredictive query languagecloud capacity planningproactive scaling AIinfrastructure optimizationoutage prevention prediction

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.