Chemistry

Autonomous adaptive data acquisition for scanning hyperspectral imaging

E. A. Holman, Y. Fang, et al.

Discover a groundbreaking grid-less autonomous adaptive sampling method for Fourier Transform infrared spectromicroscopy, developed by Elizabeth A. Holman and colleagues. This innovative approach significantly cuts down image acquisition times while enhancing sampling density in critical areas. Experience the future of hyperspectral imaging today!... show more

Introduction

Advances in optical microscopy, particularly fluorescence microscopy, enable observation of multiplexed cellular events at high spatial and temporal resolutions, but are limited to a few targeted features identified a priori. There is a need for complementary, label-free spatiochemical mapping to guide fluorescence imaging and improve interpretation of omics and transmitted light microscopy data. Scanning synchrotron radiation-based Fourier transform infrared (SR-FTIR) spectromicroscopy can map spatial heterogeneity in chemical composition that is invisible in the visible spectrum; however, acquisition times can be minutes to hours due to the high dimensionality of spectral images and the use of uniform grid (UG) sampling. The research question addressed is whether a grid-less autonomous adaptive data acquisition (AADA) strategy can reduce acquisition time while preserving or improving information content by increasing sampling density in regions with higher spatiochemical complexity. With modern computing, AADA is proposed as a systematic, reproducible alternative to UG sampling that captures spectral and spatial heterogeneity more efficiently. The study develops and evaluates an AADA approach driven by leave-one-out cross-validation (LOOCV) and a hybrid surrogate model (linear interpolation with Voronoi weighting) to prioritize sampling in information-rich regions, with demonstrations on an abiotic two-component system and living Caenorhabditis elegans.

Literature Review

The work builds on prior advances in hyperspectral and spectromicroscopy-based imaging for biological systems and on adaptive sampling and surrogate modeling literature. Background includes the use of fluorescence microscopy for dynamic, multiplexed imaging and its integration with omics analyses. For label-free techniques, SR-FTIR can reveal spatiochemical heterogeneity but is hindered by time-consuming uniform grid sampling. Methodologically, the adaptive strategy leverages LOOCV-based error estimation and surrogate modeling strategies related to LOLA-Voronoi and CV-Voronoi, combining two-dimensional barycentric linear interpolation with Voronoi tessellation to weight errors and guide sequential sampling. Related sequential design and error-pursuing sampling approaches for global metamodeling are cited as foundations for the proposed adaptive framework.

Methodology

Autonomous adaptive sampling: The workflow begins with an initial scan of randomly distributed points. Infrared spectra undergo preprocessing including frequency domain restriction, rubber-band baseline correction, and dimensionality reduction via PCA, retaining the first five principal components to increase computational efficiency during acquisition. A surrogate model U is constructed using two-dimensional barycentric linear interpolation (LIV) implemented via scipy.interpolate.griddata, treating each PCA component independently. To quantify the contribution of each sampled point, leave-one-out cross-validation (LOOCV) is performed: for each sampled point X, the model is rebuilt without that point and the LOO error is computed as the L2 norm difference between the two models evaluated at X. To account for spatial sampling non-uniformity and uncertainty not captured by linear interpolation, Voronoi tessellation is computed over sampled points; the Voronoi area associated with each point serves as an ad hoc regularizer. Both LOO error and Voronoi area are normalized to [0,1] and combined into a Voronoi-weighted LOO metric (e_LOO) used to select the next sampling region. The algorithm repeatedly samples a random point within the region associated with the point having the highest e_LOO until a stopping criterion is reached (e.g., 500 total sampled points). The mean Voronoi-weighted leave-one-out error (ε_LOO)_y over all points is used to monitor model self-consistency and accuracy due to convergence guarantees of LOOCV to generalization error.

Surrogate modeling: Linear interpolation provides computational efficiency but no uncertainty quantification; uncertainty is approximated via Voronoi area, exploiting the tendency of interpolation error to increase with distance from known points. Quantities are normalized, and a regularized LIV-LOO is computed to guide sampling.

Simulations: Simulations benchmarked adaptive LIV against non-adaptive UG (uniform grid), UR (uniform random), and LUR (least unexplored region) using 11 high-resolution SR-FTIR maps of C. elegans (step size 1–5 μm). For each map, 1000 simulations (11,000 total per method) subsampled points and compared the interpolated reconstructions to the full-resolution map treated as ground truth. UG subsampling selected every k-th point to form lower-resolution grids across all k^2 phase offsets. UR sampled k uniformly random positions. LUR sampled the most sparsely explored region iteratively. LIV followed the adaptive procedure above. Performance was quantified by ground truth error ε_GT (RMSE between model and full map).

Experiments: Abiotic two-component system of high vacuum grease and blue permanent marker mounted on ZnSe substrates was mapped using scanning FTIR. Spectral standards for both components were acquired, and component-specific peaks were identified: ν(Si–O–Si) at 798 cm⁻¹ (grease silica) and aromatic ν(−C=C−) at 1580 cm⁻¹ (marker), corroborated by ν(=C−H) 3105–3000 cm⁻¹. An on-target ratio (OTR) metric was defined as the fraction of spectra exhibiting either component’s characteristic peak above a noise-filtered mean threshold. Multivariate analyses included PCA-LDA on baseline-corrected, vector-normalized data to assess chemical discriminability.

Biological imaging: Scanning SR-FTIR spectromicroscopy was applied to living C. elegans (late L1 and young L2 stages) immobilized with levamisole and rinsed before mounting on ZnSe. Domain knowledge restricted spatial mapping to pharynx, nerve ring, and intestine. Spectral domain for adaptive sampling was restricted to 900–3700 cm⁻¹ to avoid noise and baseline effects; preprocessing as above with five PCs. Post hoc multivariate curve resolution (MCR) with five components (explaining 99.82% variance) and Fourier self-deconvolution focused on 3500–2800 cm⁻¹ to qualitatively validate chemical assignments: component 1 (hydrated proteins, amino acid v(N–H) stretching and polyglycine CH₂ at ~2925 cm⁻¹) and component 4 (hydrated lipid assemblies: broad N–H/O–H 3400–3100 cm⁻¹; antisymmetric v(-(CH₂)-) ~2932 cm⁻¹; methyl stretching 2963 and 2873 cm⁻¹).

Instrumentation and acquisition parameters: Measurements used a Nicolet Nic-Plan IR microscope with 32×, NA 0.65 objective coupled to a Thermo Scientific Nicolet i550 spectrometer (KBr beamsplitter, MCT detector) at ALS Beamline 1.4.3. Two IR sources were used: an internal globar source (aperture-limited 75 μm × 75 μm, 650–4000 cm⁻¹ at 4 cm⁻¹ resolution, 16 co-adds, mirror velocity 1.83 cm/s) and a synchrotron source (diffraction-limited 2–10 μm, 650–4000 cm⁻¹ at 4 cm⁻¹ resolution, eight co-adds, mirror velocity 6.3 cm/s). Adaptive sampling software with a PyQt GUI interfaced with Thermo OMNIC 9.8 via DDE on the beamline workstation. For globar assessments, total sampled points were capped at 500 (below a full-resolution map of 840).

Key Findings

Simulation benchmarking on 11 high-resolution C. elegans maps showed adaptive LIV required 66% of the sampled points that UG needed to achieve equal ground truth error (ε_GT). Across 11,000 trials per method, 92.6% of LIV trials, 27.1% of LUR trials, and 1.6% of UR trials outperformed the corresponding UG trial (histogram ratios < 1 relative to UG).
Abiotic two-component experiment: Adaptive LIV achieved lower mean Voronoi-weighted LOOCV error than non-adaptive methods. Using the spectral on-target ratio (OTR) metric, adaptive LIV achieved OTR = 0.95; UG OTR was reported as 0.19 in Results, while the Methods section calculation reported OTR_UG = 0.05 (both computed using characteristic component peaks and noise-filtered thresholds). PCA-LDA on acquired spectra separated pure grease, pure marker, and mixed regions, with mean spectra corroborating assignments (e.g., imine ν(C=N–H) 3400–3300 cm⁻¹ and hydrogen-bonded ν(O–H) at 3550 and 3230 cm⁻¹ for marker; vibrational silence >3000 cm⁻¹ for grease). Spectral evidence suggested reduced evaporation of marker alcohols beneath grease in mixed regions.
Living C. elegans imaging: Adaptive LIV increased sampling density in regions corresponding to known anatomical and chemical heterogeneity (transitional/overlapping regions among pharynx, head/neck/body wall muscle, nerve ring, lipid-rich intestine). MCR components 1 (hydrated proteins) and 4 (hydrated lipid assemblies) co-localized in densely sampled regions, indicating resolution of spatiochemical gradients.
Time efficiency: In a young L2 C. elegans case, the head region was mapped in 45 minutes using LIV-based AADA versus approximately 4.9 hours using commercial software (standard UG approach).
At matched time intervals, LIV-based AADA provided more comprehensive spatiochemical representation of the mapping domain than UG (e.g., Fig. 2e principal component false-color composites).

Discussion

The study demonstrates that a grid-less, LOOCV- and Voronoi-weighted, linear-interpolation-based adaptive sampling strategy can reduce hyperspectral image acquisition time while increasing sampling density in chemically complex regions. This improves the ability to detect and resolve spatiochemical gradients, as shown in living C. elegans where adaptively denser sampling aligned with known anatomical features and chemistries. Implemented on standard commercial hardware, LIV-based AADA is computationally efficient and broadly accessible for sequential scanning modalities. It operates effectively in unconstrained discovery-oriented mapping as well as in domain-constrained regions, benefiting studies with time-sensitive dynamics. Notably, for SR-FTIR mapping of C. elegans, LIV-based AADA reduced instrument time from hours to under an hour for a targeted region. These results suggest potential for further modular development toward real-time, non-invasive, label-free adaptive hyperspectral imaging, offering an orthogonal window into dynamic physicochemical architectures and guiding follow-up targeted fluorescence imaging or omics analyses.

Conclusion

This work introduces and validates an autonomous adaptive data acquisition framework for scanning hyperspectral imaging that leverages LOOCV, linear interpolation, and Voronoi-based regularization to prioritize sampling in information-rich regions. Across simulations and experiments, the method outperformed standard uniform grid sampling by achieving equivalent reconstruction accuracy with substantially fewer points, improving on-target acquisitions, and accelerating biological imaging while preserving interpretability via multivariate analyses. The approach enables more comprehensive spatiochemical assessments at any time point and can guide downstream omics or targeted imaging. Future directions include modular enhancements toward real-time adaptive control, broader application to diverse sequential scanning platforms, and extension beyond biology to remote sensing and space exploration for rapid detection and characterization of dynamic chemical events.

Limitations

Uncertainty quantification is approximated via Voronoi area as an ad hoc regularizer for linear interpolation; the surrogate does not provide formal probabilistic uncertainty estimates.
Performance was demonstrated on specific systems (an abiotic two-component sample and C. elegans) and instruments; generalizability to other sample types, modalities, and noise conditions requires further validation.
Spectral domain restrictions (e.g., 900–3700 cm⁻¹ for adaptive sampling; 3500–2800 cm⁻¹ for MCR) were used to mitigate noise and baseline effects, which may limit analysis of other spectral features.
The adaptive sampling software is currently specific to ALS Beamline 1.4.3, potentially limiting immediate reproducibility outside that environment.
Simulation benchmarking relied on high-resolution maps treated as ground truth and approximated spectra at subsampled positions by nearest-neighbor grid points, which may introduce approximation bias when subsampling density increases.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A fully autonomous robotic ultrasound system for thyroid scanning

K. Su, J. Liu, et al.

Biology

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

O. Alka, P. Shanthamoorthy, et al.

Medicine and Health

Hyperspectral imaging benchmark based on machine learning for intraoperative brain tumour detection

R. Leon, H. Fabelo, et al.

Interdisciplinary Studies

Research Data Governance. The Need for a System of Cross-organisational Responsibility for the Researcher's Data Domain

C. Odebrecht

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny