Outage Prediction
“Which transformers will fail this month?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which transformers will fail this month?
Unplanned outages cost US utilities $28B annually in repair costs, regulatory penalties, and customer compensation. Transformer failures alone account for 30% of outage minutes. Age-based replacement schedules waste $500K-$2M per premature replacement while missing failures in younger assets under stress. For a utility with 50,000 distribution transformers, predicting failures 30 days ahead saves $20-35M annually in emergency repair costs and prevents 40% of customer-minutes interrupted.
Quick answer
Outage prediction AI forecasts which distribution transformers and grid equipment will fail within 30 days by analyzing the compound interactions between overload events, inspection findings, weather stress, and neighboring asset conditions. Age-based replacement schedules waste $500K-$2M per premature replacement while missing failures in younger assets under stress. Graph-based models save utilities $20-35M annually by targeting inspection and replacement spend on the assets most likely to fail.
Approaches compared
4 ways to solve this problem
1. Age-Based Replacement Schedules
Replace equipment at fixed age thresholds (e.g., transformers at 25 years). The traditional approach used by most utilities for capital planning.
Best for
Long-term capital planning and budgeting where average replacement rates are more important than individual asset timing.
Watch out for
Wastes $500K-$2M per premature replacement of healthy equipment. Simultaneously misses failures in younger assets under stress (overloaded 8-year-old transformers fail before lightly loaded 30-year-old ones). Average age tells you nothing about individual asset condition.
2. Condition-Based Monitoring
Monitor key health indicators (oil quality, temperature, load) per asset and schedule maintenance when conditions deteriorate. More targeted than age-based schedules.
Best for
High-value assets (substation transformers, switchgear) where continuous monitoring sensors are cost-justified.
Watch out for
Monitors each asset independently. Cannot detect that a transformer's overload events are caused by its neighbor's failure, or that weather stress accumulates across an equipment cluster. Also reactive: by the time oil quality degrades, the failure window may be too short for preventive action.
3. Single-Table ML (XGBoost on Asset Features)
Train gradient-boosted models on flattened tables of asset attributes, inspection scores, load data, and weather exposure. Better than condition monitoring alone.
Best for
Utilities with structured asset management data and moderate grid complexity.
Watch out for
Flattening loses the grid topology. A transformer's failure risk depends on its neighbors' conditions (if Transformer B fails, Transformer A absorbs its load and risk spikes). Weather stress accumulates differently for clustered vs. isolated equipment. Manual feature engineering cannot capture these spatial dependencies at scale.
4. Graph Neural Networks (Kumo's Approach)
Connect assets, inspections, weather, outage history, and load data into a grid reliability graph. GNNs learn failure patterns from the asset network, including cascading load effects and compound weather-stress patterns.
Best for
Utilities with large distribution networks where failures cascade (one asset's failure increases load on neighbors) and weather stress affects equipment clusters.
Watch out for
Requires asset connectivity data (which transformers share feeders, which are on the same circuit). Most utilities have this in their GIS/OMS systems but may need data cleanup. Best value for distribution networks with 10,000+ assets.
Key metric: Graph-based outage prediction catches 40% more failures 30 days ahead compared to age-based and condition-based approaches. Emergency replacement costs 3-5x more than planned replacement, making early prediction worth $20-35M annually for a 50,000-transformer distribution network.
Why relational data changes the answer
Grid equipment does not fail in isolation. When Transformer TRX003 experiences 8 overload events in 30 days, it is not just that transformer's problem. The overloads occur because neighboring equipment has shifted load onto TRX003's feeder. When TRX003 fails, its load cascades to TRX001, whose own inspection score (72/100) means it cannot handle the additional stress. This cascading failure pattern is invisible to models that treat each asset independently.
Graph-based outage models represent the grid as it actually works: assets connected through feeders, circuits, and substations, with load flowing between them. The GNN learns that TRX003's 82% failure probability comes from the compound interaction of its own condition (inspection score 38, bushing crack), its load stress (105% peak), and its network position (absorbing load from neighboring equipment). SAP's SALT benchmark shows graph models at 91% accuracy vs 63% for gradient-boosted trees. RelBench confirms at 76.71 vs 62.44. For outage prediction, this accuracy gap means catching 40% more failures 30 days ahead, preventing $20-35M annually in emergency repair costs and customer-minutes interrupted.
Predicting transformer failures from individual asset data is like predicting which links in a chain will break by testing each link separately. The weak link might look fine in isolation but fails because the links on either side have already stretched, concentrating all the stress on it. Grid equipment works the same way: a transformer's failure risk depends on its position in the network and the condition of everything connected to it. You need to see the whole chain to predict where it will break.
How KumoRFM solves this
Graph-powered intelligence for energy and utilities
Kumo connects assets, inspections, weather exposure, outage history, and load data into a grid reliability graph. The GNN learns failure patterns from the asset network: how weather stress accumulates on equipment clusters, how one transformer's failure increases load on neighbors, and how inspection findings predict cascading failures. PQL predicts monthly failure probability per transformer, prioritizing inspection and replacement spend.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
ASSETS
| asset_id | type | age_years | manufacturer | zone_id |
|---|---|---|---|---|
| TRX001 | Pole Transformer | 18 | ABB | ZONE-A |
| TRX002 | Pad Transformer | 8 | Siemens | ZONE-B |
| TRX003 | Pole Transformer | 25 | GE | ZONE-A |
INSPECTIONS
| inspection_id | asset_id | findings | score | date |
|---|---|---|---|---|
| INS501 | TRX001 | Oil discoloration | 72 | 2025-01-15 |
| INS502 | TRX002 | No issues | 95 | 2025-02-10 |
| INS503 | TRX003 | Bushing crack, oil leak | 38 | 2024-11-20 |
WEATHER
| zone_id | month | heat_days_above_95f | storm_events | salt_exposure |
|---|---|---|---|---|
| ZONE-A | 2025-02 | 8 | 2 | Low |
| ZONE-B | 2025-02 | 5 | 1 | High |
OUTAGE_HISTORY
| outage_id | asset_id | cause | duration_hours | date |
|---|---|---|---|---|
| OUT601 | TRX003 | Overload | 4.5 | 2024-08-15 |
| OUT602 | TRX001 | Lightning | 2.0 | 2024-07-22 |
LOAD_DATA
| asset_id | avg_load_pct | peak_load_pct | overload_events_30d |
|---|---|---|---|
| TRX001 | 72% | 94% | 3 |
| TRX002 | 55% | 68% | 0 |
| TRX003 | 88% | 105% | 8 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(OUTAGE_HISTORY.outage_id, 0, 30, days) FOR EACH ASSETS.asset_id
Prediction output
Every entity gets a score, updated continuously
| ASSET_ID | TYPE | AGE | FAILURE_PROB_30D | PRIORITY |
|---|---|---|---|---|
| TRX003 | Pole Transformer | 25 yrs | 0.82 | Critical |
| TRX001 | Pole Transformer | 18 yrs | 0.35 | Medium |
| TRX002 | Pad Transformer | 8 yrs | 0.04 | Low |
Understand why
Every prediction includes feature attributions — no black boxes
Asset TRX003 -- 25-year-old Pole Transformer in ZONE-A
Predicted: 82% failure probability in next 30 days (Critical)
Top contributing features
Overload events in last 30 days
8 events
30% attribution
Inspection score (bushing crack, oil leak)
38/100
26% attribution
Peak load exceeding rated capacity
105%
19% attribution
Previous outage history
1 overload failure
14% attribution
Neighboring transformer load increase
+15%
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about outage prediction
How far ahead can AI predict transformer failures?
Graph-based models reliably predict failures 30-60 days ahead, which is enough lead time for planned replacement (30-45 days) or targeted inspection (7-14 days). Prediction accuracy at 30 days is typically 80-85% for the top-risk assets. At 7 days, accuracy reaches 90%+ but with less time to act. The sweet spot for most utilities is a 30-day forecast for replacement planning plus a 7-day forecast for inspection prioritization.
What is the cost of a transformer failure vs. planned replacement?
Emergency transformer replacement costs $50K-$200K per incident (including overtime labor, expedited equipment, customer compensation, and regulatory penalties). Planned replacement of the same asset costs $15K-$50K. The 3-5x cost multiplier for emergency response is the primary business case for predictive outage models. A utility with 50,000 distribution transformers experiencing 200-400 failures per year can save $20-35M by shifting even half of emergency replacements to planned replacements.
Can outage prediction AI work with incomplete inspection data?
Yes. Graph-based models compensate for missing inspection data by inferring condition from network context: load patterns, weather exposure, neighboring asset conditions, and outage history. An asset with no recent inspection but 8 overload events on a feeder with two other stressed transformers will still be flagged as high-risk. This actually helps prioritize inspections: send inspectors to the assets the model flags as high-risk but data-poor first.
How does weather affect transformer failure prediction?
Weather is a major driver of transformer failures. Heat waves cause thermal stress (oil expansion, insulation degradation). Storms cause physical damage (lightning, wind-borne debris). Salt air corrodes coastal equipment. Graph-based models capture how weather stress accumulates across equipment clusters: a 5-day heat wave above 95F affects all transformers on a feeder, but the one with the lowest inspection score and highest load breaks first. The spatial pattern of weather stress is a key signal that per-asset models miss.
Does outage prediction help with SAIDI/SAIFI regulatory compliance?
Directly. SAIDI (System Average Interruption Duration Index) and SAIFI (System Average Interruption Frequency Index) are the primary reliability metrics regulators track. Preventing 40% of unplanned outages through predictive replacement directly reduces both metrics. Utilities subject to performance-based rates can avoid millions in regulatory penalties by maintaining SAIDI/SAIFI within targets. The predictive model also provides documentation for regulatory filings showing proactive asset management.
Bottom line: A utility with 50,000 distribution transformers saves $20-35M annually by predicting failures 30 days ahead. Kumo's grid reliability graph detects compound stress patterns (overload + inspection findings + weather exposure) that age-based replacement schedules miss.
Related use cases
Explore more energy & utilities use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




