Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
2Binary Classification · Outage Prediction

Outage Prediction

Which transformers will fail this month?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Which transformers will fail this month?

Unplanned outages cost US utilities $28B annually in repair costs, regulatory penalties, and customer compensation. Transformer failures alone account for 30% of outage minutes. Age-based replacement schedules waste $500K-$2M per premature replacement while missing failures in younger assets under stress. For a utility with 50,000 distribution transformers, predicting failures 30 days ahead saves $20-35M annually in emergency repair costs and prevents 40% of customer-minutes interrupted.

Quick answer

Outage prediction AI forecasts which distribution transformers and grid equipment will fail within 30 days by analyzing the compound interactions between overload events, inspection findings, weather stress, and neighboring asset conditions. Age-based replacement schedules waste $500K-$2M per premature replacement while missing failures in younger assets under stress. Graph-based models save utilities $20-35M annually by targeting inspection and replacement spend on the assets most likely to fail.

Approaches compared

4 ways to solve this problem

1. Age-Based Replacement Schedules

Replace equipment at fixed age thresholds (e.g., transformers at 25 years). The traditional approach used by most utilities for capital planning.

Best for

Long-term capital planning and budgeting where average replacement rates are more important than individual asset timing.

Watch out for

Wastes $500K-$2M per premature replacement of healthy equipment. Simultaneously misses failures in younger assets under stress (overloaded 8-year-old transformers fail before lightly loaded 30-year-old ones). Average age tells you nothing about individual asset condition.

2. Condition-Based Monitoring

Monitor key health indicators (oil quality, temperature, load) per asset and schedule maintenance when conditions deteriorate. More targeted than age-based schedules.

Best for

High-value assets (substation transformers, switchgear) where continuous monitoring sensors are cost-justified.

Watch out for

Monitors each asset independently. Cannot detect that a transformer's overload events are caused by its neighbor's failure, or that weather stress accumulates across an equipment cluster. Also reactive: by the time oil quality degrades, the failure window may be too short for preventive action.

3. Single-Table ML (XGBoost on Asset Features)

Train gradient-boosted models on flattened tables of asset attributes, inspection scores, load data, and weather exposure. Better than condition monitoring alone.

Best for

Utilities with structured asset management data and moderate grid complexity.

Watch out for

Flattening loses the grid topology. A transformer's failure risk depends on its neighbors' conditions (if Transformer B fails, Transformer A absorbs its load and risk spikes). Weather stress accumulates differently for clustered vs. isolated equipment. Manual feature engineering cannot capture these spatial dependencies at scale.

4. Graph Neural Networks (Kumo's Approach)

Connect assets, inspections, weather, outage history, and load data into a grid reliability graph. GNNs learn failure patterns from the asset network, including cascading load effects and compound weather-stress patterns.

Best for

Utilities with large distribution networks where failures cascade (one asset's failure increases load on neighbors) and weather stress affects equipment clusters.

Watch out for

Requires asset connectivity data (which transformers share feeders, which are on the same circuit). Most utilities have this in their GIS/OMS systems but may need data cleanup. Best value for distribution networks with 10,000+ assets.

Key metric: Graph-based outage prediction catches 40% more failures 30 days ahead compared to age-based and condition-based approaches. Emergency replacement costs 3-5x more than planned replacement, making early prediction worth $20-35M annually for a 50,000-transformer distribution network.

Why relational data changes the answer

Grid equipment does not fail in isolation. When Transformer TRX003 experiences 8 overload events in 30 days, it is not just that transformer's problem. The overloads occur because neighboring equipment has shifted load onto TRX003's feeder. When TRX003 fails, its load cascades to TRX001, whose own inspection score (72/100) means it cannot handle the additional stress. This cascading failure pattern is invisible to models that treat each asset independently.

Graph-based outage models represent the grid as it actually works: assets connected through feeders, circuits, and substations, with load flowing between them. The GNN learns that TRX003's 82% failure probability comes from the compound interaction of its own condition (inspection score 38, bushing crack), its load stress (105% peak), and its network position (absorbing load from neighboring equipment). SAP's SALT benchmark shows graph models at 91% accuracy vs 63% for gradient-boosted trees. RelBench confirms at 76.71 vs 62.44. For outage prediction, this accuracy gap means catching 40% more failures 30 days ahead, preventing $20-35M annually in emergency repair costs and customer-minutes interrupted.

Predicting transformer failures from individual asset data is like predicting which links in a chain will break by testing each link separately. The weak link might look fine in isolation but fails because the links on either side have already stretched, concentrating all the stress on it. Grid equipment works the same way: a transformer's failure risk depends on its position in the network and the condition of everything connected to it. You need to see the whole chain to predict where it will break.

How KumoRFM solves this

Graph-powered intelligence for energy and utilities

Kumo connects assets, inspections, weather exposure, outage history, and load data into a grid reliability graph. The GNN learns failure patterns from the asset network: how weather stress accumulates on equipment clusters, how one transformer's failure increases load on neighbors, and how inspection findings predict cascading failures. PQL predicts monthly failure probability per transformer, prioritizing inspection and replacement spend.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

ASSETS

asset_idtypeage_yearsmanufacturerzone_id
TRX001Pole Transformer18ABBZONE-A
TRX002Pad Transformer8SiemensZONE-B
TRX003Pole Transformer25GEZONE-A

INSPECTIONS

inspection_idasset_idfindingsscoredate
INS501TRX001Oil discoloration722025-01-15
INS502TRX002No issues952025-02-10
INS503TRX003Bushing crack, oil leak382024-11-20

WEATHER

zone_idmonthheat_days_above_95fstorm_eventssalt_exposure
ZONE-A2025-0282Low
ZONE-B2025-0251High

OUTAGE_HISTORY

outage_idasset_idcauseduration_hoursdate
OUT601TRX003Overload4.52024-08-15
OUT602TRX001Lightning2.02024-07-22

LOAD_DATA

asset_idavg_load_pctpeak_load_pctoverload_events_30d
TRX00172%94%3
TRX00255%68%0
TRX00388%105%8
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(OUTAGE_HISTORY.outage_id, 0, 30, days)
FOR EACH ASSETS.asset_id
3

Prediction output

Every entity gets a score, updated continuously

ASSET_IDTYPEAGEFAILURE_PROB_30DPRIORITY
TRX003Pole Transformer25 yrs0.82Critical
TRX001Pole Transformer18 yrs0.35Medium
TRX002Pad Transformer8 yrs0.04Low
4

Understand why

Every prediction includes feature attributions — no black boxes

Asset TRX003 -- 25-year-old Pole Transformer in ZONE-A

Predicted: 82% failure probability in next 30 days (Critical)

Top contributing features

Overload events in last 30 days

8 events

30% attribution

Inspection score (bushing crack, oil leak)

38/100

26% attribution

Peak load exceeding rated capacity

105%

19% attribution

Previous outage history

1 overload failure

14% attribution

Neighboring transformer load increase

+15%

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about outage prediction

How far ahead can AI predict transformer failures?

Graph-based models reliably predict failures 30-60 days ahead, which is enough lead time for planned replacement (30-45 days) or targeted inspection (7-14 days). Prediction accuracy at 30 days is typically 80-85% for the top-risk assets. At 7 days, accuracy reaches 90%+ but with less time to act. The sweet spot for most utilities is a 30-day forecast for replacement planning plus a 7-day forecast for inspection prioritization.

What is the cost of a transformer failure vs. planned replacement?

Emergency transformer replacement costs $50K-$200K per incident (including overtime labor, expedited equipment, customer compensation, and regulatory penalties). Planned replacement of the same asset costs $15K-$50K. The 3-5x cost multiplier for emergency response is the primary business case for predictive outage models. A utility with 50,000 distribution transformers experiencing 200-400 failures per year can save $20-35M by shifting even half of emergency replacements to planned replacements.

Can outage prediction AI work with incomplete inspection data?

Yes. Graph-based models compensate for missing inspection data by inferring condition from network context: load patterns, weather exposure, neighboring asset conditions, and outage history. An asset with no recent inspection but 8 overload events on a feeder with two other stressed transformers will still be flagged as high-risk. This actually helps prioritize inspections: send inspectors to the assets the model flags as high-risk but data-poor first.

How does weather affect transformer failure prediction?

Weather is a major driver of transformer failures. Heat waves cause thermal stress (oil expansion, insulation degradation). Storms cause physical damage (lightning, wind-borne debris). Salt air corrodes coastal equipment. Graph-based models capture how weather stress accumulates across equipment clusters: a 5-day heat wave above 95F affects all transformers on a feeder, but the one with the lowest inspection score and highest load breaks first. The spatial pattern of weather stress is a key signal that per-asset models miss.

Does outage prediction help with SAIDI/SAIFI regulatory compliance?

Directly. SAIDI (System Average Interruption Duration Index) and SAIFI (System Average Interruption Frequency Index) are the primary reliability metrics regulators track. Preventing 40% of unplanned outages through predictive replacement directly reduces both metrics. Utilities subject to performance-based rates can avoid millions in regulatory penalties by maintaining SAIDI/SAIFI within targets. The predictive model also provides documentation for regulatory filings showing proactive asset management.

Bottom line: A utility with 50,000 distribution transformers saves $20-35M annually by predicting failures 30 days ahead. Kumo's grid reliability graph detects compound stress patterns (overload + inspection findings + weather exposure) that age-based replacement schedules miss.

Topics covered

outage prediction AItransformer failure predictionutility asset failure MLgrid reliability modeldistribution equipment predictionKumoRFM utilitiespreventive maintenance utilitiesasset failure forecasting grid

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.