Quality Defect Prediction
“Will this production run have defects?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Will this production run have defects?
Defective production runs cost manufacturers 5-15% of revenue in scrap, rework, and warranty claims. SPC charts catch process drift but miss the multi-variable interactions that cause defects: when material batch variance combines with equipment wear and ambient conditions. For a manufacturer producing $500M in goods annually, reducing defect rates from 3% to 1% saves $10M in direct costs and prevents $25M in downstream warranty exposure.
Quick answer
Quality defect prediction AI uses machine learning to forecast which production runs will produce defects before they begin. By connecting process parameters, material batch properties, and equipment condition in a graph, models catch the multi-variable interactions that cause 80% of defects but that SPC charts monitor in isolation. Manufacturers producing $500M+ in goods typically save $35M annually in scrap, rework, and warranty costs.
Approaches compared
4 ways to solve this problem
1. Statistical Process Control (SPC)
Monitor individual process parameters against control limits. Flag when any single variable drifts out of spec. The foundation of quality management since the 1920s.
Best for
Catching single-variable drift in stable, well-understood processes with clear control limits.
Watch out for
Monitors each variable independently. Most defects come from interactions between variables (material MFI + humidity + equipment wear) that SPC cannot detect. Reactive by design: catches drift after it happens, not before.
2. Design of Experiments (DOE)
Systematically vary process parameters to find optimal settings and understand interaction effects. Gold standard for process optimization in controlled environments.
Best for
New product launches where you need to establish initial parameter windows and understand key variable interactions.
Watch out for
Assumes stable materials and equipment. Real production has batch-to-batch material variation and equipment drift that invalidate DOE conclusions over time. Cannot continuously adapt to changing conditions.
3. Single-Table ML (XGBoost on Flattened Features)
Train gradient-boosted models on a flat table of process parameters, material properties, and equipment metrics. Captures non-linear relationships better than SPC.
Best for
Processes where most defect drivers can be captured in a single feature table without complex relational context.
Watch out for
Flattening loses the relationships between material batches, equipment history, and parameter interactions over time. Feature engineering is manual and requires domain experts to specify which interactions matter.
4. Graph Neural Networks (Kumo's Approach)
Connect production runs, materials, parameters, equipment, and inspection history into a manufacturing graph. GNNs automatically discover the material-equipment-parameter triplets that cause defects.
Best for
Complex manufacturing where defects arise from interactions between material batch variation, equipment condition, ambient conditions, and process parameters.
Watch out for
Needs relational data across materials, equipment, and inspections. Less useful for single-step processes with minimal equipment or material variation.
Key metric: Graph-based quality models achieve 91% defect prediction accuracy vs 63% for gradient-boosted trees on flat data (SAP SALT benchmark), with the gap driven by material-equipment-parameter interaction patterns that flat models cannot capture.
Why relational data changes the answer
Defects in manufacturing are almost never caused by a single variable going out of spec. The real culprit is the interaction: Material Batch B2025-042 with MFI at 19.8 runs fine on Injection Molder EQ001 at 185C and 45% humidity, but produces surface cracks when humidity exceeds 50% and equipment condition drops below 80%. SPC monitors each of these variables independently. Single-table ML can capture some pairwise interactions if an engineer manually creates the features. But nobody can manually enumerate every material-equipment-parameter-condition combination that matters.
Graph-based models solve this by representing the manufacturing process as it actually works: materials flow through equipment under specific parameters, producing outputs that get inspected. The GNN traverses these connections and automatically discovers which combinations predict defects. SAP's SALT benchmark shows this advantage concretely: 91% accuracy for graph-based models vs 75% for deep learning on flat data vs 63% for gradient-boosted trees. The gap is entirely driven by relational signals that flat models cannot represent. In quality prediction, RelBench benchmarks show GNN models scoring 76.71 vs 62.44 for tree-based models on relational tasks. The manufacturing process is inherently relational. Flat models force it into a spreadsheet and lose the most predictive information.
Imagine trying to predict a cake's quality by monitoring oven temperature alone. You would miss that the eggs were old, the flour was from a different supplier with higher protein content, and the baker substituted baking powder for baking soda. Quality comes from the interaction of every ingredient with every step of the process. Manufacturing defect prediction works the same way: the defect is not in any single parameter, it is in the combination of this material batch, on this equipment, under these conditions.
How KumoRFM solves this
Graph-powered intelligence for manufacturing
Kumo connects production runs, process parameters, materials, inspections, and equipment into a manufacturing graph. The GNN learns the combinatorial defect patterns that SPC misses: specific material-parameter-equipment triplets that produce defects only under certain ambient conditions. PQL predicts defect probability per production run before it starts, enabling parameter adjustments or material substitutions that prevent defects at the source.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
PRODUCTION_RUNS
| run_id | product | equipment_id | material_id | start_time |
|---|---|---|---|---|
| RUN601 | Widget-A | EQ001 | MAT101 | 2025-03-01 06:00 |
| RUN602 | Widget-B | EQ002 | MAT102 | 2025-03-01 06:00 |
| RUN603 | Widget-A | EQ003 | MAT101 | 2025-03-01 14:00 |
PARAMETERS
| run_id | temperature_c | pressure_bar | speed_rpm | humidity_pct |
|---|---|---|---|---|
| RUN601 | 185 | 42 | 1,200 | 45% |
| RUN602 | 210 | 38 | 800 | 52% |
| RUN603 | 188 | 41 | 1,180 | 48% |
MATERIALS
| material_id | supplier | batch | mfi | tensile_mpa |
|---|---|---|---|---|
| MAT101 | PolySupply Co | B2025-042 | 19.8 | 540 |
| MAT102 | ResinWorks | B2025-038 | 21.5 | 520 |
INSPECTIONS
| inspection_id | run_id | defect_count | defect_type | date |
|---|---|---|---|---|
| INS401 | RUN590 | 0 | None | 2025-02-28 |
| INS402 | RUN591 | 12 | Surface crack | 2025-02-28 |
| INS403 | RUN592 | 0 | None | 2025-02-28 |
EQUIPMENT
| equipment_id | type | hours_since_service | condition_score |
|---|---|---|---|
| EQ001 | Injection Molder | 480 | 87% |
| EQ002 | Extruder | 1,200 | 72% |
| EQ003 | Injection Molder | 120 | 95% |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(INSPECTIONS.defect_count > 0, 0, 1, days) FOR EACH PRODUCTION_RUNS.run_id
Prediction output
Every entity gets a score, updated continuously
| RUN_ID | PRODUCT | DEFECT_PROB | TOP_RISK_FACTOR |
|---|---|---|---|
| RUN601 | Widget-A | 0.38 | Material batch MFI drift |
| RUN602 | Widget-B | 0.71 | Equipment condition + humidity |
| RUN603 | Widget-A | 0.08 | Within tolerance |
Understand why
Every prediction includes feature attributions — no black boxes
Production Run RUN602 -- Widget-B on Extruder EQ002
Predicted: 71% defect probability
Top contributing features
Equipment hours since service
1,200 hrs (high)
28% attribution
Humidity above optimal range
52% (target: <48%)
24% attribution
Material MFI at upper spec limit
21.5 g/10min
21% attribution
Similar run on EQ002 had defects last week
12 defects
16% attribution
Equipment condition score below threshold
72%
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about quality defect prediction
How accurate is AI at predicting manufacturing defects?
Accuracy depends on the complexity of your process and data quality. Single-variable SPC catches about 40-50% of defects. Flat ML models reach 60-70%. Graph-based models that capture material-equipment-parameter interactions typically achieve 85-92% prediction accuracy. The key differentiator is whether your model can represent the multi-variable interactions that cause most defects.
Can defect prediction AI replace quality inspectors?
No, and it should not. Defect prediction works upstream of inspection, flagging high-risk production runs before they start. This lets you adjust parameters, substitute materials, or increase inspection frequency for flagged runs. Think of it as giving your quality team a 24-hour heads-up rather than replacing them. The best implementations reduce defect rates 60-70% while keeping human inspectors for final verification.
What data do I need for quality defect prediction?
At minimum: process parameter logs, material batch records, equipment maintenance history, and inspection results with defect classifications. Most manufacturers already have this data in MES, ERP, and QMS systems. The critical gap is usually linking these systems so you can trace which material batch ran on which equipment under which parameters. Plan 4-6 weeks for data integration.
How does defect prediction handle new materials or new products?
Graph-based models handle this better than traditional ML because they transfer knowledge from similar materials and products. A new resin grade inherits defect patterns from chemically similar grades that ran on the same equipment. A new product variant inherits patterns from its product family. Prediction accuracy for new materials typically reaches useful levels (75%+) within 2-3 production runs.
What is the business case for AI-powered quality prediction?
For a manufacturer producing $500M in goods with a 3% defect rate, reducing defects to 1% saves $10M in direct scrap and rework costs. Add $25M in prevented warranty claims downstream. Implementation costs are typically $500K-$1M, giving a 10-35x ROI in year one. The payback period is usually 2-4 months.
Bottom line: A manufacturer producing $500M in goods saves $35M annually by predicting defects before production runs start. Kumo's manufacturing graph catches the material-equipment-parameter combinations that statistical process control monitors in isolation.
Related use cases
Explore more manufacturing use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




