Yield Optimization
“What parameters maximize yield?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
What parameters maximize yield?
Yield losses in process manufacturing average 3-8% of total output. Design of experiments (DOE) finds local optima but cannot explore the full parameter space across changing material batches and equipment conditions. For a semiconductor fab producing $2B in annual output, a 1% yield improvement is worth $20M. For a chemical plant at $500M output, it is worth $5M.
Quick answer
Yield optimization AI predicts the best process parameters for each combination of material batch, equipment state, and operating conditions. Unlike DOE, which finds optimal settings under controlled conditions, graph-based models continuously adapt to real production variability. For a semiconductor fab producing $2B annually, each 1% yield improvement is worth $20M. Graph-based approaches find parameter combinations that DOE and flat ML models miss because they account for material-equipment interactions.
Approaches compared
4 ways to solve this problem
1. Design of Experiments (DOE)
Systematically vary parameters in controlled experiments to map the yield surface. The gold standard for establishing process windows. Provides statistical rigor and clear confidence intervals.
Best for
New product development and initial process characterization where you need to understand the parameter space from scratch.
Watch out for
Assumes materials and equipment are constant. In reality, material batches vary and equipment drifts between calibrations. DOE results go stale within weeks of the original experiment. Re-running DOE for every material batch is impractical.
2. Process Simulation (Digital Twin)
Build physics-based models of the manufacturing process and simulate yield under different parameter combinations. Grounded in first principles.
Best for
Well-understood processes where the physics are fully characterized, like semiconductor lithography or chemical reactions with known kinetics.
Watch out for
Calibration drift between the simulation and reality accumulates quickly. Real processes have unmodeled interactions (ambient conditions, operator behavior, material micro-variations) that physics models simplify away. Expensive to build and maintain.
3. Single-Table ML (Regression on Flattened Features)
Train regression models on flattened tables of parameters, material properties, and equipment state to predict yield. Captures non-linear relationships that DOE linear models miss.
Best for
Processes with limited relational complexity where most yield drivers are captured in process parameter logs.
Watch out for
Cannot represent the relational structure of manufacturing. Material batch effects, equipment calibration history, and recipe version interactions require manual feature engineering that is brittle and incomplete.
4. Graph Neural Networks (Kumo's Approach)
Connect recipes, parameters, materials, equipment, and output quality into a manufacturing graph. GNNs learn the yield surface across the full parameter-material-equipment space and continuously update as conditions change.
Best for
Process manufacturing with significant material variation, equipment drift, and multi-step processes where yield depends on upstream-downstream interactions.
Watch out for
Requires structured relational data across recipes, materials, equipment, and quality outcomes. Less value-add for single-step processes with minimal variability.
Key metric: Graph-based yield models find optimal parameter combinations that DOE misses, with SAP SALT showing 91% accuracy vs 63% for gradient-boosted trees. For a $2B semiconductor fab, each 1% yield improvement equals $20M in annual value.
Why relational data changes the answer
Yield optimization is inherently a relational problem. The optimal temperature for Recipe REC02 is not a fixed number. It depends on the purity of the current material batch (99.2% vs 99.7% purity requires different temperature profiles), the calibration state of Furnace EQ102 (1.1% drift means the setpoint differs from actual temperature), and even the hold time needed for the current particle size distribution (52 um particles sinter differently than 38 um). DOE captures these interactions in controlled experiments, but the real factory presents new material-equipment-condition combinations every day.
This is where graph-based yield models pull ahead. By connecting recipes to materials to equipment to output quality in a graph, the GNN learns a continuous yield surface that adapts to each new combination. SAP's SALT benchmark quantifies the advantage: 91% accuracy for graph-based models vs 75% for deep learning on flat data vs 63% for gradient-boosted trees. On the RelBench benchmark for relational prediction tasks, GNNs score 76.71 vs 62.44 for tree-based models. In yield optimization terms, that accuracy gap translates directly into percentage points of yield. For a semiconductor fab, each percentage point is $20M. The relational structure of manufacturing is not a nice-to-have for yield models. It is the primary source of prediction accuracy.
Yield optimization with flat data is like trying to bake the perfect sourdough using only a recipe card. The recipe says '450F for 25 minutes,' but your oven runs hot, today's flour has higher hydration, and the ambient humidity is 80%. A master baker adjusts every parameter based on the full context of today's ingredients, this oven's quirks, and current conditions. Graph-based yield optimization does the same: it adjusts parameters based on the full relational context of this material batch, on this equipment, under today's conditions.
How KumoRFM solves this
Graph-powered intelligence for manufacturing
Kumo connects recipes, process parameters, materials, equipment, and output quality into a manufacturing graph. The GNN learns the yield surface across the full parameter space, accounting for material-batch-to-batch variation and equipment drift that DOE assumes away. PQL predicts yield for any parameter combination, enabling operators to find the optimal set point for the current material batch and equipment state.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
RECIPES
| recipe_id | product | target_yield | version |
|---|---|---|---|
| REC01 | Compound-X | 94% | v3.2 |
| REC02 | Alloy-Y | 91% | v2.8 |
| REC03 | Film-Z | 88% | v4.1 |
PARAMETERS
| run_id | recipe_id | temp_c | pressure_bar | time_min |
|---|---|---|---|---|
| RUN701 | REC01 | 245 | 12.5 | 180 |
| RUN702 | REC02 | 1,420 | 0.8 | 45 |
| RUN703 | REC03 | 185 | 3.2 | 22 |
MATERIALS
| material_id | batch | purity_pct | particle_size_um |
|---|---|---|---|
| MAT301 | B-2025-088 | 99.7% | 45 |
| MAT302 | B-2025-091 | 99.2% | 52 |
| MAT303 | B-2025-095 | 99.9% | 38 |
EQUIPMENT
| equipment_id | type | calibration_date | drift_pct |
|---|---|---|---|
| EQ101 | Reactor | 2025-02-15 | 0.3% |
| EQ102 | Furnace | 2025-01-20 | 1.1% |
| EQ103 | Coater | 2025-02-28 | 0.1% |
OUTPUT_QUALITY
| run_id | actual_yield | grade | timestamp |
|---|---|---|---|
| RUN701 | 93.2% | A | 2025-03-01 |
| RUN702 | 88.5% | B+ | 2025-03-01 |
| RUN703 | 89.1% | A- | 2025-03-01 |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT AVG(OUTPUT_QUALITY.actual_yield, 0, 1, days) FOR EACH RECIPES.recipe_id, PARAMETERS.run_id
Prediction output
Every entity gets a score, updated continuously
| RECIPE_ID | OPTIMAL_TEMP | OPTIMAL_PRESSURE | PREDICTED_YIELD |
|---|---|---|---|
| REC01 | 248 C | 12.8 bar | 95.4% |
| REC02 | 1,415 C | 0.75 bar | 92.1% |
| REC03 | 182 C | 3.0 bar | 90.8% |
Understand why
Every prediction includes feature attributions — no black boxes
Recipe REC02 -- Alloy-Y on Furnace EQ102
Predicted: Predicted yield: 92.1% (vs current 88.5%, +3.6%)
Top contributing features
Temperature adjustment from 1420 to 1415 C
-5 C
30% attribution
Material purity interaction with pressure
99.2% x 0.75 bar
25% attribution
Furnace calibration drift compensation
1.1% drift
19% attribution
Hold time extension to 48 min
+3 min
15% attribution
Particle size impact on sintering
52 um
11% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about yield optimization
How much yield improvement can AI deliver in manufacturing?
Typical improvements range from 1-5 percentage points depending on process complexity and current optimization level. Highly optimized semiconductor fabs see 1-2% improvement (but each point is worth $20M at $2B output). Chemical and pharmaceutical processes with more material variability often see 3-5% improvement. The ROI is almost always positive within 3-6 months because even small yield gains compound across high-volume production.
Can yield optimization AI replace process engineers?
No. It augments them by exploring the parameter space faster and more completely than manual experimentation allows. Process engineers provide the domain knowledge to set parameter bounds, interpret recommendations, and override the model when plant conditions change in ways the model has not seen. The best implementations treat AI as a recommendation engine that suggests optimal parameters for engineering review, not autonomous control.
How does yield optimization handle material batch variation?
Graph-based models excel here because they represent the material-recipe-equipment relationship directly. When a new material batch arrives with slightly different purity or particle size, the model adjusts recommended parameters based on how similar batches performed on the same equipment with the same recipe. This is the core advantage over DOE, which assumes fixed material properties.
What is the difference between yield optimization and process control?
Process control maintains parameters at set points (PID loops keeping temperature at 245C). Yield optimization determines what those set points should be, given the current material batch, equipment state, and quality targets. They work together: optimization sets the target, control maintains it. Most plants have mature process control but leave the set point selection to operator experience and stale DOE results.
How long does it take to deploy AI-based yield optimization?
Typical timeline is 8-16 weeks from data access to first production recommendations. The first 4-6 weeks are data integration (connecting recipe, material, equipment, and quality data). The next 2-4 weeks are model training and validation against historical production. The final 2-4 weeks are shadow-mode deployment where the model makes recommendations alongside existing processes. Most plants see measurable yield improvement within the first month of production use.
Bottom line: A semiconductor fab producing $2B in annual output gains $20M per 1% yield improvement. Kumo's manufacturing graph finds optimal parameters for each material-batch and equipment-state combination, going beyond the local optima that DOE provides.
Related use cases
Explore more manufacturing use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




