Engineering and Technology
Fast topographic optical imaging using encoded search focal scan
N. Vilar, R. Artigas, et al.
Topographic optical imaging at the microscale underpins industrial inspection, metrology of additively manufactured parts, and 3D surface measurement of biomaterials. Conventional approaches reconstruct topography from z-stacks (confocal microscopy, interferometry), which becomes slow for bulky samples requiring large focus shifts and when high NA objectives are used, as their short depth-of-field demands many z-planes. Consequently, standard instruments are poorly suited to fast-moving processes or rapidly changing biological systems. Prior efforts either accelerate z-stack acquisition (e.g., spatiotemporal multiplexing, multifocus microscopy, encoded illumination, light-field imaging, variable optics for fast focus control) or reduce required information by exploiting sample sparsity (random-access scanning, compressed sensing, PSF engineering). These can be limited by axial range, usability, cost, sample compatibility, reconstruction fidelity, or heavy computation. A simple, real-time method for large-volume, high-resolution topography has been lacking. The authors introduce Encoded Search Focal Scan (ESFS), which acquires a reduced number of images, each during a complete axial sweep while synchronously modulating illumination. Proper sequence design enables unambiguous height recovery using far fewer images than plane-by-plane scanning, achieving order-of-magnitude speedups without extra computational complexity or fidelity loss. ESFS is broadly implementable in confocal-like systems and demonstrates sub-micrometric precision over 100 µm with dynamic measurements in real-time.
Two main strategy classes address speed in topographic imaging. (1) Speeding z-stack acquisition: spatiotemporal multiplexing, multifocus microscopy, encoded illumination, light-field imaging, and variable optical elements for rapid focusing can yield real-time 3D, but may be constrained by axial range, complexity, cost, or sample types. (2) Reducing information by leveraging sample sparsity: selecting predefined regions (random-access scanning) or using computational methods like compressed sensing and PSF engineering to reconstruct topography from far fewer images (up to an order of magnitude reduction). However, these can reduce reconstruction fidelity and typically require offline, computationally intensive processing. The paper positions ESFS as addressing these limitations by combining sparse acquisition with simple, robust decoding and high fidelity over large axial ranges.
Working principle: ESFS reframes axial localization as a binary search exploiting sample sparsity (each lateral location has a single in-focus axial position). Instead of capturing N planes, the system acquires n images while sweeping focus continuously across the full range. During each exposure, illumination is temporally modulated with a unique binary sequence Mi(z) (on/off as a function of axial position), effectively merging selected axial planes into one image. The captured image is an axial integral of the sample reflectivity and illumination convolved with the depth-dependent PSF. A focus-sensitive signal S_i(x,y) is computed from each image without lateral scanning by projecting a static structured illumination and extracting high spatial-frequency content (e.g., magnitude of Laplacian followed by Gaussian smoothing). Defocus blurs high-frequency content except near the true sample height, so thresholding S_i reveals whether the sample was illuminated in the Mi(z) window at its in-focus time. Using a set of sequences that implement a binary (or Gray) code, the step index d(x,y) indicating which axial bin contains the surface is obtained with only log2(M) images via thresholding and decoding. The axial step size is bounded below by the objective’s depth-of-field. Precision enhancement (second stage): Additional focal-sweep images are acquired with periodic intensity modulation at a period matched to the axial step τ, e.g., sinusoidal modulation with N cycles across the range and known phase offsets. The focus-sensitive signal varies approximately sinusoidally with the in-focus axial position, enabling phase retrieval (analogous to phase-shifting interferometry) from three or more phase-shifted images. Using three images with 2π/3 phase shifts yields a wrapped axial estimate modulo τ via a standard three-step algorithm. The unwrapped height is recovered by combining the coarse step index d from stage one with the wrapped phase from stage two: z = z_min + d·τ + z_wrapped. Implementation details:
- Focus-sensitive metric: S_i(x,y) = G_σ^2{∇^2 I_i(x,y)} with Gaussian smoothing to suppress noise; implemented with σ1=1.5 and σ2=5 pixels in practice. Thresholds S_th(x,y) are calibrated, and Gray code can be used for robust decoding.
- Prototype 1 (motorized sweep): 20x/0.45 NA objective (DOF ≈ 2.6 µm); axial range Z_R = 100 µm; stage velocity 1000 µm/s with 100 ms exposure to scan the full range; N=8 axial steps (τ=12.5 µm). Stage 1 used 3 binary images plus 1 additional Gray-coded image (total 4) to determine d. Stage 2 used 4 images with pulsed/sinusoidal-equivalent modulation at 12.5 ms period with equally spaced phases (total 8 images overall). Structured illumination was a chrome-on-glass checkerboard (13 µm pitch). Synchronization via custom software and Arduino.
- Prototype 2 (fast TAG lens sweep): Implemented on a commercial microscope with a tunable acoustic gradient (TAG) lens in a 4f relay to sinusoidally sweep focus at microsecond scales. A chrome-on-glass Ronchi ruling (50 µm pitch) provided structured illumination; LED pulsing synchronized with TAG drive and camera. Calibration in the linear region of the axial scan used certified step height samples to determine the scale factor relating phase to height. Parameters matched Prototype 1 for steps and image counts (4+4), yielding an axial range ~54 µm. The acquisition rate was limited by camera at 539 fps, enabling one topography per 15 ms (67 topographies/s). Reconstruction and computation: Field curvature was calibrated using a flat mirror. Topographies were reconstructed in Python/Matlab; the most time-consuming steps are computing the focus metric and arctangent. A CPU C++ implementation processed 512×512 images in <30 ms per reconstruction; GPU optimization would reduce below the 15 ms acquisition time.
- ESFS achieves order-of-magnitude reduction in required images: e.g., 400 µm range at 1.6 µm steps would drop from 250 images (plane-by-plane) to 8 images; experimentally, 100 µm range reconstructions used only 8 images.
- Precision and accuracy: Sub-micrometric precision with precision below 50 nm over 100 µm range (reported). Certified step height measured at 21.704 µm versus certified 21.702 µm ± 0.022 µm (excellent agreement).
- Surface roughness: On NPL AIRB40, measured Sa = 0.77 µm and Sq = 0.99 µm, matching certified Sa = 0.79 µm (±0.03 µm) and Sq = 1.00 µm (±0.02 µm).
- Speed: With TAG lens implementation, achieved 67 topographies per second (one topography every 15 ms), limited by 539 fps camera frame rate; axial range ~54 µm in this setup.
- Dynamic measurements: Real-time topographic imaging of a MEMS gas sensor’s suspended membrane under 6 Hz heating modulation; periodic height changes of ~2 µm clearly resolved; operable up to at least 20 Hz modulation.
- Trade-offs: Compared to conventional plane-by-plane scanning, ESFS shows increased system noise but only slight lateral resolution reduction; unmeasured regions can occur where local slopes exceed objective NA.
- Robust phase unwrapping: The stage-one binary/Gray code provides unambiguous unwrapping for the phase-shifting stage, enabling large axial ranges with high precision.
ESFS combines a binary search-based coarse localization with phase-shifting precision refinement to deliver rapid, robust 3D topography. It leverages the sparsity of typical surfaces (single axial height per lateral pixel) to reduce images dramatically while retaining fidelity. The approach is conceptually akin to phase-shifting interferometry (for precision) and to fringe projection profilometry’s two-step coding, but without FPP’s severe lateral resolution penalties from defocus and triangulation geometry; ESFS uses the full imaging NA and extends the axial range arbitrarily via axial scanning. Motion artifacts are possible if dynamics exceed the sampling rate, yet ESFS’s reduction in required images and camera-limited acquisition enable real-time capture of moving samples where traditional z-stacks fail. A key practical consideration is SNR: integrating over a larger axial range weakens the optical sectioning signal when computed post-acquisition. This can be mitigated by implementing ESFS within a confocal detection scheme (rejecting out-of-focus light) or by adaptively restricting illumination or scan to the axial region containing the sample (informed by stage one), preserving SNR even for large ranges. Higher-base coding (tertiary or beyond) is feasible if SNR permits, further reducing needed images. Computationally, reconstruction is lightweight and amenable to real-time processing and parallelization.
The paper introduces ESFS, a two-stage, temporally encoded focal-sweep method for fast topographic optical imaging. By replacing exhaustive plane-by-plane acquisition with a binary (or Gray) coded search plus phase-shifting refinement, ESFS reconstructs accurate, unambiguously unwrapped height maps with far fewer images. Experiments demonstrate sub-micrometric precision across 100 µm axial ranges, validated against certified standards, and real-time operation at 67 topographies per second for dynamic samples such as MEMS devices. ESFS is simple to implement on existing microscopes with minor additions (structured illumination, illumination modulation, and axial sweep control) and allows high-resolution measurements of moving or large-range samples previously impractical. Future directions include confocal implementations to maintain high SNR over extended ranges, higher-base codes to cut image counts further, adaptive illumination/scanning guided by stage-one results, and optimized GPU processing to exceed acquisition-rate real-time reconstruction.
- SNR decreases as the axial measurement range increases when the focus-sensitive signal is computed from integrated images, potentially reducing precision; confocal implementations could mitigate this.
- Minimum practical axial step size is limited by the objective’s depth-of-field; smaller steps do not improve axial resolution.
- Increased system noise and a slight reduction in lateral resolution compared to conventional plane-by-plane scanning were observed.
- Unmeasured regions occur where local sample slopes exceed the objective NA (specular reflections not captured).
- Motion artifacts can arise if sample dynamics are faster than the achievable sampling rate; speed is ultimately limited by camera frame rate and sweep speed.
- TAG lens calibration is valid in the linear region of axial scanning; deviations may require recalibration or restrict usable range.
Related Publications
Explore these studies to deepen your understanding of the subject.

