What data is needed for predictive maintenance?

Kumo connects directly to your existing relational tables: EQUIPMENT, SENSORS, MAINTENANCE_LOGS, PARTS, PRODUCTION_RUNS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

1Binary Classification · Predictive Maintenance

Predictive Maintenance

“Which machines will fail in the next 7 days?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Which machines will fail in the next 7 days?

Unplanned downtime costs manufacturers $50B per year globally. Time-based maintenance over-services healthy equipment and misses early failure modes. Sensor-only models detect anomalies but generate too many false alarms and miss failures caused by interaction effects between equipment, parts, and operating conditions. For a plant with 500 machines, reducing unplanned downtime by 30% saves $8-12M annually.

Quick answer

Predictive maintenance AI uses machine learning on sensor data, maintenance logs, and production patterns to predict which machines will fail within 7 days. The best models connect equipment relationships in a graph, catching interaction effects between machines that single-sensor anomaly detection misses. Plants with 500+ machines typically save $8-12M annually by reducing unplanned downtime 30%.

Approaches compared

4 ways to solve this problem

1. Threshold-Based Alerts

Set fixed thresholds on individual sensors (vibration > 6 mm/s, temperature > 85C) and alert when crossed. Simple to implement and easy to explain to maintenance teams.

Best for

Single-mode failures with clear sensor signatures, like bearing wear on isolated equipment.

Watch out for

Generates excessive false alarms (30-50% false positive rate) and misses failures caused by multi-sensor interactions. Thresholds require constant manual tuning as equipment ages.

2. Statistical Anomaly Detection

Use statistical models (ARIMA, exponential smoothing) to learn normal sensor patterns per machine and flag deviations. More adaptive than fixed thresholds.

Best for

Detecting gradual degradation trends in well-instrumented equipment with long historical baselines.

Watch out for

Treats each machine independently, so it misses failure patterns that depend on interactions between upstream and downstream equipment. Cannot incorporate maintenance history or part age.

3. Single-Table ML (XGBoost/Random Forest)

Train gradient-boosted models on flattened feature tables combining sensor readings, equipment metadata, and maintenance history. The current industry standard for predictive maintenance.

Best for

Mid-complexity environments where most failures have clear feature signatures in a single equipment's data.

Watch out for

Flattening relational data loses the graph structure. A machine's risk depends on its neighbors, shared parts suppliers, and production line context. Feature engineering is manual and brittle.

4. Graph Neural Networks (Kumo's Approach)

Model the factory as a graph connecting equipment, sensors, parts, maintenance logs, and production runs. GNNs learn failure patterns from equipment interactions automatically, without manual feature engineering.

Best for

Complex plants where failures depend on multi-equipment interactions, shared parts, and operating condition combinations.

Watch out for

Requires relational data (not just sensor feeds). Best when you have at least 6-12 months of maintenance history across interconnected equipment.

Key metric: SAP's SALT benchmark shows graph-based predictive maintenance achieves 91% accuracy vs 75% for deep learning on flat data vs 63% for gradient-boosted trees, with the gap driven by multi-equipment interaction patterns.

Why relational data changes the answer

Most predictive maintenance systems treat each machine as an island. They monitor Machine A's vibration, Machine A's temperature, and Machine A's operating hours in isolation. But in a real factory, failures are relational. When Machine A's vibration increases, it changes the load profile on Machine B downstream. When a specific bearing batch from Supplier X is installed across 15 machines, failures cluster. When production runs push equipment above 90% load for consecutive shifts, the risk compounds across the entire line.

This is exactly why flat-table ML models plateau at 63% accuracy on equipment failure prediction, while graph-based approaches reach 91% (based on SAP's SALT benchmark). The gap comes from relational signals: part-equipment-condition triplets, production line cascade effects, and maintenance history patterns across similar equipment. RelBench benchmarks confirm this pattern, with GNN-based models scoring 76.71 vs 62.44 for gradient-boosted trees on relational prediction tasks. The factory is a graph. Treating it like a spreadsheet leaves the most predictive signals on the table.

Think of a factory like a human body. A doctor who only checks your heart rate will miss that your chest pain is caused by a pinched nerve in your spine affecting your posture, which strains your breathing, which elevates your heart rate. Good diagnostics trace the chain of causation across connected systems. Predictive maintenance works the same way: the vibration spike in Machine A is a symptom, but the root cause might be the worn bearing in Machine B that changed the load profile across the entire production line.

How KumoRFM solves this

Graph-powered intelligence for manufacturing

Kumo connects equipment, sensors, maintenance logs, parts, and production runs into a factory graph. The GNN learns failure patterns that depend on equipment interactions: when machine A's vibration increase coincides with machine B's temperature drift downstream, and how specific part-equipment-operating condition combinations predict failure. PQL predicts which machines will fail within 7 days, giving maintenance teams time to schedule repairs during planned downtime windows.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

EQUIPMENT

equipment_id	type	install_date	line
EQ001	CNC Lathe	2020-06-15	Line-A
EQ002	Press Machine	2018-03-10	Line-A
EQ003	Conveyor Motor	2022-01-20	Line-B

SENSORS

sensor_id	equipment_id	metric	latest_value	threshold
SEN101	EQ001	Vibration (mm/s)	4.8	6.0
SEN102	EQ001	Temperature (C)	72	85
SEN103	EQ002	Pressure (bar)	148	160

MAINTENANCE_LOGS

log_id	equipment_id	type	description	date
ML201	EQ001	Preventive	Bearing replacement	2025-01-15
ML202	EQ002	Corrective	Hydraulic seal repair	2025-02-10
ML203	EQ003	Preventive	Belt tension adjust	2025-02-20

PARTS

part_id	equipment_id	name	age_hours	rated_life_hours
PRT301	EQ001	Spindle Bearing	3,200	5,000
PRT302	EQ002	Hydraulic Seal	800	4,000
PRT303	EQ003	Drive Belt	1,500	3,000

PRODUCTION_RUNS

run_id	equipment_id	duration_hours	load_pct	date
RUN501	EQ001	12	92%	2025-03-01
RUN502	EQ002	8	78%	2025-03-01
RUN503	EQ003	16	95%	2025-03-01

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(MAINTENANCE_LOGS.type = 'Corrective', 0, 7, days)
FOR EACH EQUIPMENT.equipment_id

Prediction output

Every entity gets a score, updated continuously

EQUIPMENT_ID	TYPE	FAILURE_PROB_7D	RISK_TIER
EQ001	CNC Lathe	0.68	High
EQ002	Press Machine	0.11	Low
EQ003	Conveyor Motor	0.42	Medium

Understand why

Every prediction includes feature attributions — no black boxes

Equipment EQ001 -- CNC Lathe on Line-A

Predicted: 68% failure probability in next 7 days (High risk)

Top contributing features

Vibration trend (14-day slope)

+32% increase

30% attribution

Spindle bearing age vs rated life

64% consumed

24% attribution

Operating load above 90% for 5+ days

92% avg

20% attribution

Temperature drift correlated with downstream press

+3.5C

15% attribution

Similar equipment failure pattern on Line-B

Failed last month

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about predictive maintenance

How much data do I need to start predictive maintenance with AI?

You need at least 6-12 months of sensor data, maintenance logs, and production records to train a reliable model. The key is not volume but variety: you need examples of both failures and normal operation across different operating conditions. Most plants already have this data in their CMMS and historian systems. Start with your highest-cost failure modes and expand from there.

What is the ROI of predictive maintenance vs preventive maintenance?

Predictive maintenance typically delivers 25-35% reduction in unplanned downtime, 10-20% reduction in maintenance costs (by eliminating unnecessary preventive work), and 5-15% extension of equipment life. For a plant with 500 machines, this translates to $8-12M in annual savings. The ROI timeline is usually 6-9 months from deployment to measurable returns.

Can predictive maintenance AI work with legacy equipment that has limited sensors?

Yes, but with reduced prediction horizons. Graph-based models compensate for sparse sensor data by pulling signals from connected equipment, maintenance history, and production context. A machine with only 2 sensors can still be predicted accurately if its neighbors are well-instrumented. Many plants start with retrofitting 3-5 key sensors per critical machine and achieve 70%+ prediction accuracy.

How does predictive maintenance handle new equipment with no failure history?

Graph-based models handle cold-start better than traditional ML because they transfer knowledge from similar equipment. A new CNC lathe inherits failure patterns from existing CNC lathes on the same production line, with the same parts, under similar operating conditions. Prediction accuracy for new equipment typically reaches 80% of mature equipment accuracy within 3 months of operation.

What is the difference between condition monitoring and predictive maintenance?

Condition monitoring tells you the current state of equipment (vibration is elevated). Predictive maintenance tells you the future state (this machine has a 68% chance of failing in 7 days). The gap is the prediction model that translates current conditions, combined with historical patterns and equipment context, into actionable forecasts with enough lead time to schedule repairs.

Bottom line: A plant with 500 machines saves $8-12M annually by reducing unplanned downtime 30%. Kumo's factory graph detects multi-equipment interaction patterns and part degradation trajectories that sensor-only anomaly detection misses.

Related use cases

Explore more manufacturing use cases

Use Case #2Quality Defect PredictionLearn more

Use Case #3Yield OptimizationLearn more

Use Case #4Energy OptimizationLearn more

Next#2 Quality Defect Prediction

Topics covered

predictive maintenance AIequipment failure predictionmachine learning maintenancecondition-based maintenanceindustrial IoT predictionKumoRFM manufacturingasset failure forecastingmaintenance optimization ML

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free