Executive AI Dinner hosted by Kumo - Austin, April 8

Register here
2Binary Classification · Defect Prediction

Quality Defect Prediction

Will this production run have defects?

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

Catalina Logo

A real-world example

Will this production run have defects?

Defective production runs cost manufacturers 5-15% of revenue in scrap, rework, and warranty claims. SPC charts catch process drift but miss the multi-variable interactions that cause defects: when material batch variance combines with equipment wear and ambient conditions. For a manufacturer producing $500M in goods annually, reducing defect rates from 3% to 1% saves $10M in direct costs and prevents $25M in downstream warranty exposure.

Quick answer

Quality defect prediction AI uses machine learning to forecast which production runs will produce defects before they begin. By connecting process parameters, material batch properties, and equipment condition in a graph, models catch the multi-variable interactions that cause 80% of defects but that SPC charts monitor in isolation. Manufacturers producing $500M+ in goods typically save $35M annually in scrap, rework, and warranty costs.

Approaches compared

4 ways to solve this problem

1. Statistical Process Control (SPC)

Monitor individual process parameters against control limits. Flag when any single variable drifts out of spec. The foundation of quality management since the 1920s.

Best for

Catching single-variable drift in stable, well-understood processes with clear control limits.

Watch out for

Monitors each variable independently. Most defects come from interactions between variables (material MFI + humidity + equipment wear) that SPC cannot detect. Reactive by design: catches drift after it happens, not before.

2. Design of Experiments (DOE)

Systematically vary process parameters to find optimal settings and understand interaction effects. Gold standard for process optimization in controlled environments.

Best for

New product launches where you need to establish initial parameter windows and understand key variable interactions.

Watch out for

Assumes stable materials and equipment. Real production has batch-to-batch material variation and equipment drift that invalidate DOE conclusions over time. Cannot continuously adapt to changing conditions.

3. Single-Table ML (XGBoost on Flattened Features)

Train gradient-boosted models on a flat table of process parameters, material properties, and equipment metrics. Captures non-linear relationships better than SPC.

Best for

Processes where most defect drivers can be captured in a single feature table without complex relational context.

Watch out for

Flattening loses the relationships between material batches, equipment history, and parameter interactions over time. Feature engineering is manual and requires domain experts to specify which interactions matter.

4. Graph Neural Networks (Kumo's Approach)

Connect production runs, materials, parameters, equipment, and inspection history into a manufacturing graph. GNNs automatically discover the material-equipment-parameter triplets that cause defects.

Best for

Complex manufacturing where defects arise from interactions between material batch variation, equipment condition, ambient conditions, and process parameters.

Watch out for

Needs relational data across materials, equipment, and inspections. Less useful for single-step processes with minimal equipment or material variation.

Key metric: Graph-based quality models achieve 91% defect prediction accuracy vs 63% for gradient-boosted trees on flat data (SAP SALT benchmark), with the gap driven by material-equipment-parameter interaction patterns that flat models cannot capture.

Why relational data changes the answer

Defects in manufacturing are almost never caused by a single variable going out of spec. The real culprit is the interaction: Material Batch B2025-042 with MFI at 19.8 runs fine on Injection Molder EQ001 at 185C and 45% humidity, but produces surface cracks when humidity exceeds 50% and equipment condition drops below 80%. SPC monitors each of these variables independently. Single-table ML can capture some pairwise interactions if an engineer manually creates the features. But nobody can manually enumerate every material-equipment-parameter-condition combination that matters.

Graph-based models solve this by representing the manufacturing process as it actually works: materials flow through equipment under specific parameters, producing outputs that get inspected. The GNN traverses these connections and automatically discovers which combinations predict defects. SAP's SALT benchmark shows this advantage concretely: 91% accuracy for graph-based models vs 75% for deep learning on flat data vs 63% for gradient-boosted trees. The gap is entirely driven by relational signals that flat models cannot represent. In quality prediction, RelBench benchmarks show GNN models scoring 76.71 vs 62.44 for tree-based models on relational tasks. The manufacturing process is inherently relational. Flat models force it into a spreadsheet and lose the most predictive information.

Imagine trying to predict a cake's quality by monitoring oven temperature alone. You would miss that the eggs were old, the flour was from a different supplier with higher protein content, and the baker substituted baking powder for baking soda. Quality comes from the interaction of every ingredient with every step of the process. Manufacturing defect prediction works the same way: the defect is not in any single parameter, it is in the combination of this material batch, on this equipment, under these conditions.

How KumoRFM solves this

Graph-powered intelligence for manufacturing

Kumo connects production runs, process parameters, materials, inspections, and equipment into a manufacturing graph. The GNN learns the combinatorial defect patterns that SPC misses: specific material-parameter-equipment triplets that produce defects only under certain ambient conditions. PQL predicts defect probability per production run before it starts, enabling parameter adjustments or material substitutions that prevent defects at the source.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

1

Your data

The relational tables Kumo learns from

PRODUCTION_RUNS

run_idproductequipment_idmaterial_idstart_time
RUN601Widget-AEQ001MAT1012025-03-01 06:00
RUN602Widget-BEQ002MAT1022025-03-01 06:00
RUN603Widget-AEQ003MAT1012025-03-01 14:00

PARAMETERS

run_idtemperature_cpressure_barspeed_rpmhumidity_pct
RUN601185421,20045%
RUN6022103880052%
RUN603188411,18048%

MATERIALS

material_idsupplierbatchmfitensile_mpa
MAT101PolySupply CoB2025-04219.8540
MAT102ResinWorksB2025-03821.5520

INSPECTIONS

inspection_idrun_iddefect_countdefect_typedate
INS401RUN5900None2025-02-28
INS402RUN59112Surface crack2025-02-28
INS403RUN5920None2025-02-28

EQUIPMENT

equipment_idtypehours_since_servicecondition_score
EQ001Injection Molder48087%
EQ002Extruder1,20072%
EQ003Injection Molder12095%
2

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL
PREDICT BOOL(INSPECTIONS.defect_count > 0, 0, 1, days)
FOR EACH PRODUCTION_RUNS.run_id
3

Prediction output

Every entity gets a score, updated continuously

RUN_IDPRODUCTDEFECT_PROBTOP_RISK_FACTOR
RUN601Widget-A0.38Material batch MFI drift
RUN602Widget-B0.71Equipment condition + humidity
RUN603Widget-A0.08Within tolerance
4

Understand why

Every prediction includes feature attributions — no black boxes

Production Run RUN602 -- Widget-B on Extruder EQ002

Predicted: 71% defect probability

Top contributing features

Equipment hours since service

1,200 hrs (high)

28% attribution

Humidity above optimal range

52% (target: <48%)

24% attribution

Material MFI at upper spec limit

21.5 g/10min

21% attribution

Similar run on EQ002 had defects last week

12 defects

16% attribution

Equipment condition score below threshold

72%

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

Frequently asked questions

Common questions about quality defect prediction

How accurate is AI at predicting manufacturing defects?

Accuracy depends on the complexity of your process and data quality. Single-variable SPC catches about 40-50% of defects. Flat ML models reach 60-70%. Graph-based models that capture material-equipment-parameter interactions typically achieve 85-92% prediction accuracy. The key differentiator is whether your model can represent the multi-variable interactions that cause most defects.

Can defect prediction AI replace quality inspectors?

No, and it should not. Defect prediction works upstream of inspection, flagging high-risk production runs before they start. This lets you adjust parameters, substitute materials, or increase inspection frequency for flagged runs. Think of it as giving your quality team a 24-hour heads-up rather than replacing them. The best implementations reduce defect rates 60-70% while keeping human inspectors for final verification.

What data do I need for quality defect prediction?

At minimum: process parameter logs, material batch records, equipment maintenance history, and inspection results with defect classifications. Most manufacturers already have this data in MES, ERP, and QMS systems. The critical gap is usually linking these systems so you can trace which material batch ran on which equipment under which parameters. Plan 4-6 weeks for data integration.

How does defect prediction handle new materials or new products?

Graph-based models handle this better than traditional ML because they transfer knowledge from similar materials and products. A new resin grade inherits defect patterns from chemically similar grades that ran on the same equipment. A new product variant inherits patterns from its product family. Prediction accuracy for new materials typically reaches useful levels (75%+) within 2-3 production runs.

What is the business case for AI-powered quality prediction?

For a manufacturer producing $500M in goods with a 3% defect rate, reducing defects to 1% saves $10M in direct scrap and rework costs. Add $25M in prevented warranty claims downstream. Implementation costs are typically $500K-$1M, giving a 10-35x ROI in year one. The payback period is usually 2-4 months.

Bottom line: A manufacturer producing $500M in goods saves $35M annually by predicting defects before production runs start. Kumo's manufacturing graph catches the material-equipment-parameter combinations that statistical process control monitors in isolation.

Topics covered

quality defect prediction AIproduction defect MLmanufacturing quality modelprocess quality predictionSPC machine learningKumoRFM qualitydefect rate forecastingzero-defect manufacturing

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.