What data is needed for quality defect prediction?

Kumo connects directly to your existing relational tables: PRODUCTION_RUNS, PARAMETERS, MATERIALS, INSPECTIONS, EQUIPMENT. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

2Binary Classification · Defect Prediction

Quality Defect Prediction

“Will this production run have defects?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

Will this production run have defects?

Defective production runs cost manufacturers 5-15% of revenue in scrap, rework, and warranty claims. SPC charts catch process drift but miss the multi-variable interactions that cause defects: when material batch variance combines with equipment wear and ambient conditions. For a manufacturer producing $500M in goods annually, reducing defect rates from 3% to 1% saves $10M in direct costs and prevents $25M in downstream warranty exposure.

Quick answer

Quality defect prediction AI uses machine learning to forecast which production runs will produce defects before they begin. By connecting process parameters, material batch properties, and equipment condition in a graph, models catch the multi-variable interactions that cause 80% of defects but that SPC charts monitor in isolation. Manufacturers producing $500M+ in goods typically save $35M annually in scrap, rework, and warranty costs.

Approaches compared

4 ways to solve this problem

1. Statistical Process Control (SPC)

Monitor individual process parameters against control limits. Flag when any single variable drifts out of spec. The foundation of quality management since the 1920s.

Best for

Catching single-variable drift in stable, well-understood processes with clear control limits.

Watch out for

Monitors each variable independently. Most defects come from interactions between variables (material MFI + humidity + equipment wear) that SPC cannot detect. Reactive by design: catches drift after it happens, not before.

2. Design of Experiments (DOE)

Systematically vary process parameters to find optimal settings and understand interaction effects. Gold standard for process optimization in controlled environments.

Best for

New product launches where you need to establish initial parameter windows and understand key variable interactions.

Watch out for

Assumes stable materials and equipment. Real production has batch-to-batch material variation and equipment drift that invalidate DOE conclusions over time. Cannot continuously adapt to changing conditions.

3. Single-Table ML (XGBoost on Flattened Features)

Train gradient-boosted models on a flat table of process parameters, material properties, and equipment metrics. Captures non-linear relationships better than SPC.

Best for

Processes where most defect drivers can be captured in a single feature table without complex relational context.

Watch out for

Flattening loses the relationships between material batches, equipment history, and parameter interactions over time. Feature engineering is manual and requires domain experts to specify which interactions matter.

4. Graph Neural Networks (Kumo's Approach)

Connect production runs, materials, parameters, equipment, and inspection history into a manufacturing graph. GNNs automatically discover the material-equipment-parameter triplets that cause defects.

Best for

Complex manufacturing where defects arise from interactions between material batch variation, equipment condition, ambient conditions, and process parameters.

Watch out for

Needs relational data across materials, equipment, and inspections. Less useful for single-step processes with minimal equipment or material variation.

Key metric: Graph-based quality models achieve 91% defect prediction accuracy vs 63% for gradient-boosted trees on flat data (SAP SALT benchmark), with the gap driven by material-equipment-parameter interaction patterns that flat models cannot capture.

Why relational data changes the answer

Defects in manufacturing are almost never caused by a single variable going out of spec. The real culprit is the interaction: Material Batch B2025-042 with MFI at 19.8 runs fine on Injection Molder EQ001 at 185C and 45% humidity, but produces surface cracks when humidity exceeds 50% and equipment condition drops below 80%. SPC monitors each of these variables independently. Single-table ML can capture some pairwise interactions if an engineer manually creates the features. But nobody can manually enumerate every material-equipment-parameter-condition combination that matters.

Graph-based models solve this by representing the manufacturing process as it actually works: materials flow through equipment under specific parameters, producing outputs that get inspected. The GNN traverses these connections and automatically discovers which combinations predict defects. SAP's SALT benchmark shows this advantage concretely: 91% accuracy for graph-based models vs 75% for deep learning on flat data vs 63% for gradient-boosted trees. The gap is entirely driven by relational signals that flat models cannot represent. In quality prediction, RelBench benchmarks show GNN models scoring 76.71 vs 62.44 for tree-based models on relational tasks. The manufacturing process is inherently relational. Flat models force it into a spreadsheet and lose the most predictive information.

Imagine trying to predict a cake's quality by monitoring oven temperature alone. You would miss that the eggs were old, the flour was from a different supplier with higher protein content, and the baker substituted baking powder for baking soda. Quality comes from the interaction of every ingredient with every step of the process. Manufacturing defect prediction works the same way: the defect is not in any single parameter, it is in the combination of this material batch, on this equipment, under these conditions.

How KumoRFM solves this

Graph-powered intelligence for manufacturing

Kumo connects production runs, process parameters, materials, inspections, and equipment into a manufacturing graph. The GNN learns the combinatorial defect patterns that SPC misses: specific material-parameter-equipment triplets that produce defects only under certain ambient conditions. PQL predicts defect probability per production run before it starts, enabling parameter adjustments or material substitutions that prevent defects at the source.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

PRODUCTION_RUNS

run_id	product	equipment_id	material_id	start_time
RUN601	Widget-A	EQ001	MAT101	2025-03-01 06:00
RUN602	Widget-B	EQ002	MAT102	2025-03-01 06:00
RUN603	Widget-A	EQ003	MAT101	2025-03-01 14:00

PARAMETERS

run_id	temperature_c	pressure_bar	speed_rpm	humidity_pct
RUN601	185	42	1,200	45%
RUN602	210	38	800	52%
RUN603	188	41	1,180	48%

MATERIALS

material_id	supplier	batch	mfi	tensile_mpa
MAT101	PolySupply Co	B2025-042	19.8	540
MAT102	ResinWorks	B2025-038	21.5	520

INSPECTIONS

inspection_id	run_id	defect_count	defect_type	date
INS401	RUN590	0	None	2025-02-28
INS402	RUN591	12	Surface crack	2025-02-28
INS403	RUN592	0	None	2025-02-28

EQUIPMENT

equipment_id	type	hours_since_service	condition_score
EQ001	Injection Molder	480	87%
EQ002	Extruder	1,200	72%
EQ003	Injection Molder	120	95%

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(INSPECTIONS.defect_count > 0, 0, 1, days)
FOR EACH PRODUCTION_RUNS.run_id

Prediction output

Every entity gets a score, updated continuously

RUN_ID	PRODUCT	DEFECT_PROB	TOP_RISK_FACTOR
RUN601	Widget-A	0.38	Material batch MFI drift
RUN602	Widget-B	0.71	Equipment condition + humidity
RUN603	Widget-A	0.08	Within tolerance

Understand why

Every prediction includes feature attributions — no black boxes

Production Run RUN602 -- Widget-B on Extruder EQ002

Predicted: 71% defect probability

Top contributing features

Equipment hours since service

1,200 hrs (high)

28% attribution

Humidity above optimal range

52% (target: <48%)

24% attribution

Material MFI at upper spec limit

21.5 g/10min

21% attribution

Similar run on EQ002 had defects last week

12 defects

16% attribution

Equipment condition score below threshold

72%

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about quality defect prediction

How accurate is AI at predicting manufacturing defects?

Accuracy depends on the complexity of your process and data quality. Single-variable SPC catches about 40-50% of defects. Flat ML models reach 60-70%. Graph-based models that capture material-equipment-parameter interactions typically achieve 85-92% prediction accuracy. The key differentiator is whether your model can represent the multi-variable interactions that cause most defects.

Can defect prediction AI replace quality inspectors?

No, and it should not. Defect prediction works upstream of inspection, flagging high-risk production runs before they start. This lets you adjust parameters, substitute materials, or increase inspection frequency for flagged runs. Think of it as giving your quality team a 24-hour heads-up rather than replacing them. The best implementations reduce defect rates 60-70% while keeping human inspectors for final verification.

What data do I need for quality defect prediction?

At minimum: process parameter logs, material batch records, equipment maintenance history, and inspection results with defect classifications. Most manufacturers already have this data in MES, ERP, and QMS systems. The critical gap is usually linking these systems so you can trace which material batch ran on which equipment under which parameters. Plan 4-6 weeks for data integration.

How does defect prediction handle new materials or new products?

Graph-based models handle this better than traditional ML because they transfer knowledge from similar materials and products. A new resin grade inherits defect patterns from chemically similar grades that ran on the same equipment. A new product variant inherits patterns from its product family. Prediction accuracy for new materials typically reaches useful levels (75%+) within 2-3 production runs.

What is the business case for AI-powered quality prediction?

For a manufacturer producing $500M in goods with a 3% defect rate, reducing defects to 1% saves $10M in direct scrap and rework costs. Add $25M in prevented warranty claims downstream. Implementation costs are typically $500K-$1M, giving a 10-35x ROI in year one. The payback period is usually 2-4 months.

Bottom line: A manufacturer producing $500M in goods saves $35M annually by predicting defects before production runs start. Kumo's manufacturing graph catches the material-equipment-parameter combinations that statistical process control monitors in isolation.

Related use cases

Explore more manufacturing use cases

Use Case #1Predictive MaintenanceLearn more

Use Case #3Yield OptimizationLearn more

Use Case #5Demand PlanningLearn more

Previous#1 Predictive Maintenance

Next#3 Yield Optimization

Topics covered

quality defect prediction AIproduction defect MLmanufacturing quality modelprocess quality predictionSPC machine learningKumoRFM qualitydefect rate forecastingzero-defect manufacturing

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free