What data is needed for yield optimization?

Kumo connects directly to your existing relational tables: RECIPES, PARAMETERS, MATERIALS, EQUIPMENT, OUTPUT_QUALITY. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

3Regression · Yield Optimization

Yield Optimization

“What parameters maximize yield?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

What parameters maximize yield?

Yield losses in process manufacturing average 3-8% of total output. Design of experiments (DOE) finds local optima but cannot explore the full parameter space across changing material batches and equipment conditions. For a semiconductor fab producing $2B in annual output, a 1% yield improvement is worth $20M. For a chemical plant at $500M output, it is worth $5M.

Quick answer

Yield optimization AI predicts the best process parameters for each combination of material batch, equipment state, and operating conditions. Unlike DOE, which finds optimal settings under controlled conditions, graph-based models continuously adapt to real production variability. For a semiconductor fab producing $2B annually, each 1% yield improvement is worth $20M. Graph-based approaches find parameter combinations that DOE and flat ML models miss because they account for material-equipment interactions.

Approaches compared

4 ways to solve this problem

1. Design of Experiments (DOE)

Systematically vary parameters in controlled experiments to map the yield surface. The gold standard for establishing process windows. Provides statistical rigor and clear confidence intervals.

Best for

New product development and initial process characterization where you need to understand the parameter space from scratch.

Watch out for

Assumes materials and equipment are constant. In reality, material batches vary and equipment drifts between calibrations. DOE results go stale within weeks of the original experiment. Re-running DOE for every material batch is impractical.

2. Process Simulation (Digital Twin)

Build physics-based models of the manufacturing process and simulate yield under different parameter combinations. Grounded in first principles.

Best for

Well-understood processes where the physics are fully characterized, like semiconductor lithography or chemical reactions with known kinetics.

Watch out for

Calibration drift between the simulation and reality accumulates quickly. Real processes have unmodeled interactions (ambient conditions, operator behavior, material micro-variations) that physics models simplify away. Expensive to build and maintain.

3. Single-Table ML (Regression on Flattened Features)

Train regression models on flattened tables of parameters, material properties, and equipment state to predict yield. Captures non-linear relationships that DOE linear models miss.

Best for

Processes with limited relational complexity where most yield drivers are captured in process parameter logs.

Watch out for

Cannot represent the relational structure of manufacturing. Material batch effects, equipment calibration history, and recipe version interactions require manual feature engineering that is brittle and incomplete.

4. Graph Neural Networks (Kumo's Approach)

Connect recipes, parameters, materials, equipment, and output quality into a manufacturing graph. GNNs learn the yield surface across the full parameter-material-equipment space and continuously update as conditions change.

Best for

Process manufacturing with significant material variation, equipment drift, and multi-step processes where yield depends on upstream-downstream interactions.

Watch out for

Requires structured relational data across recipes, materials, equipment, and quality outcomes. Less value-add for single-step processes with minimal variability.

Key metric: Graph-based yield models find optimal parameter combinations that DOE misses, with SAP SALT showing 91% accuracy vs 63% for gradient-boosted trees. For a $2B semiconductor fab, each 1% yield improvement equals $20M in annual value.

Why relational data changes the answer

Yield optimization is inherently a relational problem. The optimal temperature for Recipe REC02 is not a fixed number. It depends on the purity of the current material batch (99.2% vs 99.7% purity requires different temperature profiles), the calibration state of Furnace EQ102 (1.1% drift means the setpoint differs from actual temperature), and even the hold time needed for the current particle size distribution (52 um particles sinter differently than 38 um). DOE captures these interactions in controlled experiments, but the real factory presents new material-equipment-condition combinations every day.

This is where graph-based yield models pull ahead. By connecting recipes to materials to equipment to output quality in a graph, the GNN learns a continuous yield surface that adapts to each new combination. SAP's SALT benchmark quantifies the advantage: 91% accuracy for graph-based models vs 75% for deep learning on flat data vs 63% for gradient-boosted trees. On the RelBench benchmark for relational prediction tasks, GNNs score 76.71 vs 62.44 for tree-based models. In yield optimization terms, that accuracy gap translates directly into percentage points of yield. For a semiconductor fab, each percentage point is $20M. The relational structure of manufacturing is not a nice-to-have for yield models. It is the primary source of prediction accuracy.

Yield optimization with flat data is like trying to bake the perfect sourdough using only a recipe card. The recipe says '450F for 25 minutes,' but your oven runs hot, today's flour has higher hydration, and the ambient humidity is 80%. A master baker adjusts every parameter based on the full context of today's ingredients, this oven's quirks, and current conditions. Graph-based yield optimization does the same: it adjusts parameters based on the full relational context of this material batch, on this equipment, under today's conditions.

How KumoRFM solves this

Graph-powered intelligence for manufacturing

Kumo connects recipes, process parameters, materials, equipment, and output quality into a manufacturing graph. The GNN learns the yield surface across the full parameter space, accounting for material-batch-to-batch variation and equipment drift that DOE assumes away. PQL predicts yield for any parameter combination, enabling operators to find the optimal set point for the current material batch and equipment state.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

RECIPES

recipe_id	product	target_yield	version
REC01	Compound-X	94%	v3.2
REC02	Alloy-Y	91%	v2.8
REC03	Film-Z	88%	v4.1

PARAMETERS

run_id	recipe_id	temp_c	pressure_bar	time_min
RUN701	REC01	245	12.5	180
RUN702	REC02	1,420	0.8	45
RUN703	REC03	185	3.2	22

MATERIALS

material_id	batch	purity_pct	particle_size_um
MAT301	B-2025-088	99.7%	45
MAT302	B-2025-091	99.2%	52
MAT303	B-2025-095	99.9%	38

EQUIPMENT

equipment_id	type	calibration_date	drift_pct
EQ101	Reactor	2025-02-15	0.3%
EQ102	Furnace	2025-01-20	1.1%
EQ103	Coater	2025-02-28	0.1%

OUTPUT_QUALITY

run_id	actual_yield	grade	timestamp
RUN701	93.2%	A	2025-03-01
RUN702	88.5%	B+	2025-03-01
RUN703	89.1%	A-	2025-03-01

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT AVG(OUTPUT_QUALITY.actual_yield, 0, 1, days)
FOR EACH RECIPES.recipe_id, PARAMETERS.run_id

Prediction output

Every entity gets a score, updated continuously

RECIPE_ID	OPTIMAL_TEMP	OPTIMAL_PRESSURE	PREDICTED_YIELD
REC01	248 C	12.8 bar	95.4%
REC02	1,415 C	0.75 bar	92.1%
REC03	182 C	3.0 bar	90.8%

Understand why

Every prediction includes feature attributions — no black boxes

Recipe REC02 -- Alloy-Y on Furnace EQ102

Predicted: Predicted yield: 92.1% (vs current 88.5%, +3.6%)

Top contributing features

Temperature adjustment from 1420 to 1415 C

-5 C

30% attribution

Material purity interaction with pressure

99.2% x 0.75 bar

25% attribution

Furnace calibration drift compensation

1.1% drift

19% attribution

Hold time extension to 48 min

+3 min

15% attribution

Particle size impact on sintering

52 um

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about yield optimization

How much yield improvement can AI deliver in manufacturing?

Typical improvements range from 1-5 percentage points depending on process complexity and current optimization level. Highly optimized semiconductor fabs see 1-2% improvement (but each point is worth $20M at $2B output). Chemical and pharmaceutical processes with more material variability often see 3-5% improvement. The ROI is almost always positive within 3-6 months because even small yield gains compound across high-volume production.

Can yield optimization AI replace process engineers?

No. It augments them by exploring the parameter space faster and more completely than manual experimentation allows. Process engineers provide the domain knowledge to set parameter bounds, interpret recommendations, and override the model when plant conditions change in ways the model has not seen. The best implementations treat AI as a recommendation engine that suggests optimal parameters for engineering review, not autonomous control.

How does yield optimization handle material batch variation?

Graph-based models excel here because they represent the material-recipe-equipment relationship directly. When a new material batch arrives with slightly different purity or particle size, the model adjusts recommended parameters based on how similar batches performed on the same equipment with the same recipe. This is the core advantage over DOE, which assumes fixed material properties.

What is the difference between yield optimization and process control?

Process control maintains parameters at set points (PID loops keeping temperature at 245C). Yield optimization determines what those set points should be, given the current material batch, equipment state, and quality targets. They work together: optimization sets the target, control maintains it. Most plants have mature process control but leave the set point selection to operator experience and stale DOE results.

How long does it take to deploy AI-based yield optimization?

Typical timeline is 8-16 weeks from data access to first production recommendations. The first 4-6 weeks are data integration (connecting recipe, material, equipment, and quality data). The next 2-4 weeks are model training and validation against historical production. The final 2-4 weeks are shadow-mode deployment where the model makes recommendations alongside existing processes. Most plants see measurable yield improvement within the first month of production use.

Bottom line: A semiconductor fab producing $2B in annual output gains $20M per 1% yield improvement. Kumo's manufacturing graph finds optimal parameters for each material-batch and equipment-state combination, going beyond the local optima that DOE provides.

Related use cases

Explore more manufacturing use cases

Use Case #2Quality Defect PredictionLearn more

Use Case #1Predictive MaintenanceLearn more

Use Case #4Energy OptimizationLearn more

Previous#2 Quality Defect Prediction

Next#4 Energy Optimization

Topics covered

yield optimization AIprocess optimization MLmanufacturing yield predictionparameter optimization modelrecipe optimizationKumoRFM yieldproduction efficiency AIgolden batch prediction

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free