logo
ResearchBunny Logo
Human-machine collaboration for improving semiconductor process development

Engineering and Technology

Human-machine collaboration for improving semiconductor process development

K. J. Kanarik, W. T. Osowiecki, et al.

This groundbreaking research by Keren J. Kanarik and colleagues from Lam Research Corporation explores the application of Bayesian optimization algorithms in semiconductor chip fabrication. Discover how a hybrid strategy combining human expertise with computer efficiency significantly reduces costs while overcoming cultural challenges in collaboration.... show more
Introduction

Semiconductor chips underpin AI systems and are fabricated through complex multi-step processes, many involving plasma etch and deposition. Process development is typically done by expert engineers using intuition and trial-and-error because collecting wafer-based experimental data is expensive, yielding a ‘little data’ regime that hinders accurate predictive modeling. The authors pose the challenge of reducing cost-to-target relative to experienced human engineers. To benchmark AI against humans under controlled and comparable conditions, they created a virtual process game that maps tool parameter recipes to etch outcomes and allows systematic comparison of strategies while avoiding real-lab variability and cost. The goal is to determine whether AI, especially Bayesian optimization, can lower development cost and how best to combine human expertise with algorithms.

Literature Review

The paper situates its work in the context of AI outperforming humans in domains like chess and Go when large, inexpensive training data are available. In contrast, semiconductor process development suffers from expensive, scarce data and complex, poorly characterized physics (‘little data’). Prior literature discusses plasma processing and the pseudo–black-box nature of plasma etching, as well as strategies for applying machine learning to small datasets in materials science. Bayesian optimization is a common approach for optimizing expensive black-box functions and has been explored in semiconductor-related applications (e.g., sputtering, lithography tuning). The authors also draw analogies to early computer chess (algorithms assisting in endgames) and directed evolution in protein engineering needing a suitable starting point, motivating human-guided algorithm strategies. References further cover acquisition functions (e.g., expected improvement, lower confidence bound), Gaussian processes, priors in Bayesian learning, transfer learning, and physics-informed models.

Methodology

The study uses a virtual process game modeling a single-step plasma etch of a high-aspect-ratio hole in silicon dioxide. A proprietary feature profile simulator, parameterized and calibrated from existing data with physics-based and empirical relationships, maps input tool parameter recipes (pressure, dual plasma powers, Ar/CF4/CHF3/O2 flows, pulsing duty/frequency, wafer temperature) to outputs (etch depth/rate, mask remaining, top CD, ACD, bow CD) and profile images. Participants submit batches (one or more recipes), receive outputs, and iterate until meeting target metrics (as defined in Extended Data). Costs are assigned as $1,000 per recipe plus $1,000 per batch overhead. Random chance of meeting target was estimated at 0.003% per recipe based on 35,000 random samples. Human benchmarking: Six professional process engineers (three senior with >7 years experience; three junior with <1 year) designed experiments via mechanistic hypotheses and domain knowledge, using univariate/bivariate parameter changes in 95% of recipes and an average batch size of four. Three inexperienced individuals also participated. Trajectories were characterized into rough tuning (rapid initial improvement) and fine-tuning (slow progress meeting all metrics). Senior engineers required about half the cost of junior engineers to achieve similar progress. The winning human (senior engineer 1) achieved a cost-to-target of $105,000, used as the expert benchmark. Algorithm benchmarking: Three Bayesian optimization variants were implemented: (1) Algo1 using MCMC sampling with a multivariate linear surrogate and expected improvement acquisition; (2) Algo2 using Tree-structured Parzen Estimator with expected improvement; (3) Algo3 using a Gaussian process model with a lower confidence bound acquisition. All used scaled Euclidean distance as the objective, non-informative priors, and no pretraining. Algorithms used output metrics only (profile images ignored) and one recipe per batch. Each condition was repeated 100 times; trajectories were truncated if not meeting target before $105,000 (expert benchmark). Success rate was defined as the percentage of trajectories with lower cost-to-target than the expert. Each algorithm started with a 32-recipe Latin hypercube seed, then generated a single recipe per batch. Human first–computer last (HF-CL): To mitigate aimless exploration, the expert provided data collected up to transfer points A–E (covering progressively more of the human’s trajectory) and a constrained search range. Random success within this constrained range was estimated at 0.27% per recipe (13% over 2,700 random samples). After transfer, the algorithm alone chose experiments; each condition was repeated 100 times. Cost-to-target includes both human and algorithm costs.

Key Findings
  • Human performance: Senior engineers achieved roughly half the cost-to-target of junior engineers for comparable progress. The expert benchmark (senior engineer 1) reached target at $105,000.
  • Algorithms alone: Starting from scratch with a 32-recipe seed, success rates (beating the expert) were <1% (Algo1), 2% (Algo2), and 11% (Algo3). Across 300 attempts, only 13 (<5%) beat the expert. One Algo2 run allowed beyond truncation met target at $739,000, far worse than the expert, confirming poor cost-efficiency under little data.
  • HF-CL strategy: Providing expert data and constrained ranges improved algorithm performance with a V-shaped dependence of total cost-to-target on the amount of expert data. At transfer point A (least data), success rates were 20% (Algo1), 43% (Algo2), and 42% (Algo3), still risky relative to expert alone. The optimal transfer was at point C for all algorithms. HF-CL with Algo3 set a new benchmark with a median cost-to-target of $52,000—about half the expert’s cost. Beyond point C, adding more human data increased cost without clear algorithmic benefit, forming the right side of the V.
  • Stage specialization: Humans excelled in rough tuning (early navigation via domain knowledge), while algorithms were more cost-efficient in fine tuning near tight tolerances.
  • Baseline probabilities: Random chance of meeting target was 0.003% per recipe; within expert-constrained ranges, estimated 0.27% per recipe (13% over 2,700 samples).
Discussion

The findings address the central question of whether AI can reduce process-development cost under little-data constraints. Algorithms without prior knowledge perform poorly due to expensive, sparse data and complex process physics. However, combining human domain knowledge (early-stage guidance) with algorithmic optimization (late-stage fine-tuning) yields substantial cost reductions. The observed V-shaped relationship between cost-to-target and the amount of human-provided data reflects a trade-off: initially, more data makes the algorithm competent (left side), but excessive human-driven experimentation adds cost with diminishing returns once algorithms become better at fine-tuning (right side). This suggests the fine-tuning stage should be algorithm-led. The phenomenon appears robust across different human/algorithm pairings and likely generalizes to other little-data manufacturing problems. Cultural challenges are anticipated: algorithms propose multivariate, sometimes counterintuitive recipes; they favor single-experiment batches and exploratory moves, which may conflict with human practices. Successful adoption will require trust in algorithmic strategies and adjustments in laboratory workflows. Identifying the optimal transfer point prospectively will depend on factors such as process dimensionality, noise/drift, tolerance tightness, batch size, constraints, and cost structure.

Conclusion

AI alone, given little data, could eventually reach targets but at unacceptably high cost relative to expert humans. A hybrid human first–computer last approach leverages human intuition for early exploration and algorithmic efficiency for fine tuning, halving the cost-to-target versus an expert alone (median $52,000 with a Gaussian process–LCB algorithm versus $105,000 expert benchmark). The work demonstrates a practical path to accelerate semiconductor process development using virtual platforms to de-risk strategies. Future directions include encoding domain knowledge into algorithms (priors, transfer learning), integrating mechanistic physics models, and methods to identify optimal transfer points a priori. As process nonlinearities intensify near tight targets, continued data collection will remain necessary, meaning process engineering will persist as a little-data challenge where human–AI collaboration is advantageous.

Limitations
  • Results are derived from a proprietary virtual process simulator rather than real laboratory experiments; translation to physical tools may involve additional variability (wafer incoming variation, metrology noise, equipment drift).
  • A relatively small number of human test cases were included, limiting generalizability despite consistent trends.
  • Algorithms were restricted to using numeric output metrics and ignored profile images, potentially omitting informative features.
  • Bayesian optimization trajectories were truncated at the expert cost ($105,000), which may bias comparisons against longer-running algorithm strategies.
  • Algorithms began with non-informative priors and no pretraining; alternative priors or transfer learning might change outcomes.
  • Batch strategy differences (algorithms using one recipe per batch vs humans averaging four) and constrained search ranges provided by experts could affect cost accounting and comparability.
  • The optimal transfer point depends on multiple context-specific factors (noise, drift, tolerance, dimensionality, costs), which were not exhaustively explored.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny