Engineering and Technology
Neural étendue expander for ultra-wide-angle high-fidelity holographic display
E. Tseng, G. Kuo, et al.
Holography forms images by controlling interference patterns, and modern dynamic holographic displays use spatial light modulators (SLMs) to modulate coherent light. Practical LCOS-based SLMs have limited pixel pitch, constraining diffraction angle and fundamentally limiting étendue (product of FOV and eyebox for near-eye displays, or viewing angle and display size for table-top). This forces a trade-off between FOV and display size. For immersive VR/AR, at least ~120° FOV and >10 × 10 mm² eyebox are desired, implying over a billion SLM pixels—far beyond current technology for fabrication and computation. Existing approaches to overcome étendue constraints (e.g., eye tracking with dynamic feedback, multiple SLMs, temporal integration with laser arrays/DMDs) increase complexity, footprint, timing constraints, and power. Alternatives like rewritable photopolymers have low refresh rates; MEMS options have low pixel count and bit depth; other methods trade spatial for depth resolution or optimize coherent/incoherent interference but do not increase total étendue. Prior étendue expansion via randomized scattering elements places micron-scale static scattering masks in front of the SLM to increase diffraction angle but is agnostic to optics and image statistics, leading to low-fidelity reconstructions and often requiring extensive calibration. Lens/lenslet-based FOV expansion constrains pupil position and shrinks effective eyebox; tilting cascades require multiple 4F relays with large physical footprint. The research question is how to expand étendue to achieve ultra-wide FOV while maintaining high-fidelity reconstructions and a compact form factor, without complex dynamic components.
Several strategies have been explored to address étendue limits: (1) Dynamic feedback (e.g., eye tracking) to steer content within limited étendue, but with latency/timing issues and increased complexity and power; (2) Spatial integration using multiple SLMs and temporal integration using laser arrays or high-speed DMDs, which add components and constraints; (3) Rewritable photopolymers offering static holograms but limited refresh rates; (4) MEMS devices with limited pixel counts/bit depth; (5) Trade-offs within fixed étendue (e.g., spatial vs. depth resolution, coherent vs. incoherent optimization) that do not increase total étendue; (6) Randomized scattering optics (e.g., photon sieves, random masks) to increase diffraction angle via finer static features, but these are agnostic to image and system statistics, causing low-fidelity reconstructions, chromatic artifacts, and calibration needs; (7) Lens/lenslet arrays to expand FOV but at the cost of eyebox reduction due to pupil alignment constraints; (8) Tilting cascades enabling exponential étendue growth but requiring multiple 4F relays, large physical footprint, and complexity. These highlight the need for static, compact, and image-aware optics that cooperate with the SLM to expand étendue while preserving fidelity across wavelengths.
Core idea: Introduce a static, learned diffractive optical element (neural étendue expander) with smaller pixel pitch than the SLM, placed at the SLM’s conjugate plane, to increase maximum diffraction angle and thus étendue. Jointly optimize the expander’s wavefront modulation together with per-image SLM patterns using a differentiable holographic image formation model and a perceptual loss reflecting human retinal resolution. Key components: 1) Étendue model: Ge = 4 A sin^2(theta_e), with theta_e = arcsin(lambda / (2 As)), where A is SLM area, As is SLM pixel pitch. With a neural expander of pitch An < As, the maximum diffraction angle increases to theta_n, giving expanded étendue Gn = 4 A sin^2(theta_n). 2) Image formation: I = |F( ε ⊙ U(S) )|^2, where F is 2D Fourier transform, ε is expander modulation, S is SLM modulation, U(.) upsamples S to expander resolution, and ⊙ is element-wise product. 3) Training objective: Jointly optimize ε (single static element) and per-image S patterns {S_k} over a natural-image dataset {T_k} by minimizing the perceptually filtered intensity reconstruction error: minimize Σ_k || ( |F( ε ⊙ U(S_k) )|^2 − T_k ) * f ||_2^2, where * is convolution and f is a Butterworth low-pass filter modeling retinal resolution. The filter is defined via f = F^{-1} ( (1 + (ω/c)^2)^{-1} ), with cutoff c = 2π / N, where N is SLM pixel count. This biases optimization to preserve frequencies perceptible to the human eye while allowing high-frequency energy to be shifted outside the retinal passband. 4) Optimization: End-to-end differentiable training using stochastic gradient methods (e.g., Adam) akin to a shallow neural network; one SLM pattern per training image; ε shared across all images. Training data: 105 high-resolution natural images; testing on 20 natural images; grayscale for monochrome design and RGB for trichromatic. No temporal multiplexing in training. 5) Analysis: Virtual frequency modulation ε̃ is characterized by rendering with zero-phase SLM and taking the spectrum; learned ε̃ matches natural image frequency statistics in the passband and pushes artifacts outside perceptual bands. A theoretical upper bound (via Parseval) links the optimal ε to the average natural image spectrum under the retinal filter, explaining the frequency shaping behavior. 6) Hardware prototype: Neural expanders fabricated as diffractive optical elements with 2 μm pitch via laser beam lithography to form a stamp, then resin stamping onto glass. The resin’s refractive indices are wavelength-dependent: n=1.5081 (660 nm), 1.5159 (517 nm), 1.5223 (450 nm); incorporated into design. Expander placed at SLM conjugate plane; a DC block removes undiffracted (zero-order) light. Prototype includes an SLM (HOLOEYE-PLUTO), 4F system, eyepiece/imaging lens, and camera. Comparison expanders: binary random phase mask (designed for 660 nm), uniform random phase mask, and photon sieves. 7) Evaluation: Experimental captures under trichromatic illumination (450, 517, 660 nm); étendue expanded holograms at 64× (8× in each axis) demonstrated. Temporal averaging of 20 frames applied in shown captures; single-frame results provided in supplements. Simulations further assess fidelity across étendue factors (4×, 16×, 36×, 64×) and across SLM resolutions (including 8K).
- Neural étendue expanders enable 64× étendue expansion (8× per axis) while maintaining high-fidelity reconstructions. - Experimental prototype produced 64× étendue expanded full-color holograms with high contrast and minimal speckle, and without chromatic aberrations, outperforming binary and uniform random expanders and photon sieves. - Quantitatively, for 64× trichromatic holograms, neural expanders achieve over 14 dB PSNR improvement versus random expanders; for monochromatic holograms, over 10 dB improvement. Overall reconstruction PSNR exceeds 29 dB on retinal-resolution images. - Neural expanders produce consistent quality across wavelengths (450 nm, 517 nm, 660 nm), whereas binary random expanders (designed for 660 nm) suffer chromatic artifacts and reduced contrast at other wavelengths. - Learned expanders shape virtual frequency modulation to match natural image statistics within the retinal passband and push reconstruction noise outside perceivable bands. - Visualization of learned phase patterns shows structured multi-scale features; corresponding virtual frequency spectra align with natural image spectra within the passband across étendue factors (4×, 16×, 36×, 64×). - Experimental system with a 1K-pixel SLM yields an eyebox of ~1 mm and horizontal/vertical FOV of ~77.4°, demonstrating the trade enabled by étendue expansion. - Robustness to eye pupil position/shape variations observed when initializing with uniform random patterns, distributing energy across the eyebox; methods relying on quadratic phase profiles are less robust. - Extension to 3D: Neural expanders support higher-fidelity étendue-expanded 3D color holograms; competing methods typically handle only monochromatic 3D or sparse color points (photon sieves). - Scalability: Simulations with an 8K-pixel SLM and 64× étendue expansion maintain fidelity gains and indicate étendue sufficient to cover ~85% of human stereo FOV with an 18.5 mm eyebox.
The work addresses the fundamental étendue bottleneck of dynamic SLM-based holographic displays by adding a static, learned diffractive element that cooperates with per-image SLM modulation. By explicitly modeling human retinal resolution in the loss and using a differentiable Fourier optics pipeline, the learned expander allocates spectral energy to preserve perceptually important content while shifting artifacts out of the visible passband. This differs from randomized expanders and photon sieves, which indiscriminately scatter light and degrade fidelity, and from lens/lenslet and tilting-cascade approaches with trade-offs in eyebox or form factor. The demonstrated 64× étendue expansion with high PSNR and multi-wavelength robustness shows that static, data-driven optics can substantially increase angular coverage without sacrificing image quality. Robustness to pupil movement and support for 3D holography further highlight practical relevance for near-eye and immersive displays. The method’s generalization across étendue factors and SLM resolutions suggests broad applicability, with simulations indicating feasibility for wide-FOV, large-eyebox VR/AR targets.
Neural étendue expanders are the first learned optical elements designed to expand holographic display étendue while preserving high-fidelity, achieving 64× étendue expansion (8× per axis) with >29 dB PSNR and large gains over random expanders and photon sieves, including robust trichromatic and 3D performance. The approach jointly optimizes a static DOE and per-image SLM patterns using a differentiable, perceptually guided objective, producing structured phase elements that match natural image statistics in the visible passband. The concept is experimentally validated and scales in simulation to higher-resolution SLMs towards VR/AR-scale FOV and eyebox. Future work could integrate metasurface-based implementations to further increase diffraction angles, leverage polarization control, and reduce device footprint, as well as explore broader datasets, hardware co-design, and real-time pipelines.
- The paper does not enumerate explicit limitations; however, experimental captures presented used temporal averaging of 20 frames to reduce speckle, with single-frame results available in supplementary materials. - The experimental prototype demonstration uses a 1K-pixel SLM (yielding a ~1 mm eyebox and ~77.4° FOV); scaling to larger eyebox and FOV is supported by simulations (e.g., 8K SLM) rather than shown experimentally. - Fabrication relies on precise DOE manufacturing and wavelength-dependent resin indices; multi-wavelength performance requires accurate modeling of dispersion. - The optimization is data-driven; performance may depend on the representativeness of the training dataset of natural images, though generalization to unseen images was demonstrated.
Related Publications
Explore these studies to deepen your understanding of the subject.

