
Physics
Coincidence imaging for Jones matrix with a deep-learning approach
J. Xi, T. K. Yung, et al.
Discover an innovative deep-learning technique for Jones matrix imaging using photon arrival data, crafted by researchers Jiawei Xi, Tsz Kit Yung, Hong Liang, Tan Li, Wing Yim Tam, and Jensen Li. This groundbreaking approach surpasses traditional low-light measurement methods, enhancing accuracy while minimizing photon requirements.
~3 min • Beginner • English
Introduction
The study addresses the challenge of extracting polarization-resolved object information (Jones matrix parameters) from coincidence measurements in low-light conditions. Traditional two-photon imaging and Hong–Ou–Mandel (HOM)-type interference approaches rely on accurate estimates of the second-order coherence g²(τ) to extract amplitude and phase profiles, which requires many detected photons and tailored analytical models. The authors aim to develop a deep-learning framework that directly uses photon arrival data to recover the Jones matrix at each pixel, reducing photon requirements and eliminating the need for hand-crafted inversion algorithms. In the broader context, coincidence measurements and heralded photons enhance signal-to-noise ratios for quantum imaging, and metasurfaces offer precise manipulation of light’s degrees of freedom. The proposed approach seeks to combine these advantages with unsupervised learning to automatically find minimal, physically meaningful data representations and assess whether a given experimental design contains sufficient information for accurate imaging.
Literature Review
The paper surveys prior advances in coincidence-based quantum imaging and HOM interference for applications including holography, super-resolution, and polarization-resolved imaging. It highlights the role of metasurfaces in controlling phase, polarization, wavelength, and orbital angular momentum for classical and quantum applications, including quantum state control and tomography. For Jones matrix imaging, previous methods include full-characterization techniques requiring many measurements, as well as more efficient approaches such as vectorial Fourier ptychography and Fourier space sharing. Recent metasurface-driven techniques have enabled polarization and Jones matrix imaging of unknown objects via HOM-type interference by comparing unknown object pixels to reference panels with known responses. In parallel, machine- and deep-learning have improved quantum optical tasks such as classifying light sources, spatial mode extraction under low-light, and accelerating antibunching super-resolution by predicting g₂(0). These advances suggest potential for DL to directly extract sample features from photon arrival data without relying on precise g² estimation, especially in low-photon regimes.
Methodology
Experimental concept and baseline: A metasurface sample contains a reference region with four panels (H, D, V, A) acting as polarizers of identical transmission amplitude but different axis angles, and an object region (24×24 pixels) with unknown Jones matrices parameterized by three degrees of freedom (DOFs). Photon pairs with an optical delay τ and orthogonal circular polarizations illuminate the sample. After analysis by a quarter-wave plate and a polarizer, a SPAD camera records binary photon arrivals per pixel per time frame. Coincidence events between each object pixel and each reference panel are accumulated versus τ to form g²(τ) curves; a semi-analytic fitting approach uses the visibility model Vj(θ) to estimate the Jones matrix DOFs as a baseline.
Deep-learning pipeline: The proposed approach replaces explicit g² estimation and fitting with a two-stage neural pipeline: (1) an unsupervised β-VAE that ingests raw, flattened correlation matrices C (rows: reference panels; columns: time frames; entries indicate frame-by-frame coincidences between the object pixel and each reference panel) to learn minimal, disentangled latent variables; and (2) a supervised regression network mapping the meaningful latent variables to the three Jones DOFs.
Simulation and data generation: To train the networks, 50,000 synthetic datasets are generated, each corresponding to a random object pixel with DOFs (θ, φ, tj) sampled from uniform ranges U(0, 2π), U(0, π), and U(0.1, 0.5), respectively, along with four reference pixels (H, D, V, A) with known Jones matrices. Photon-pair amplitudes include random phase φ∼U(0, 2π) to emulate weakly coherent photon-pair sources. For each time frame, detected photon counts per set of five pixels (four reference + one object) are sampled from a Poisson distribution with mean reflecting incident power, transmissions, and detection efficiency; photons are assigned to pixels proportionally to their amplitudes. Coincidences per frame between the object pixel and each reference panel form the correlation matrix C (size 4×Nframes). Each dataset contains 2000 frames for generality; to enforce time-translation symmetry, frame orders are randomly permuted, expanding the dataset to 200,000 instances, split into 81% training, 9% validation, and 10% testing.
β-VAE architecture and training: Input size is 8000 (4×2000). The encoder and decoder each use three fully connected hidden layers. The latent space allows up to five variables to test DOF discovery. The loss is MSE reconstruction plus β times the KL divergence to independent N(0,1) latents, with β≈1 (ADAM, learning rate 3e-4). Post-training, three latent variables exhibit high variance in μ and low σ², indicating three meaningful DOFs; two latents carry no information. The trained encoder can process C with fewer frames (≤2000) via padding or replication to match input size.
Regression network: A fully connected network (3-50-50-50-50-4) maps the three meaningful latent variables to the Jones DOFs; since θ is cyclic, outputs include sinθ and cosθ. Training uses MSE loss on simulated data. Test correlations between targets and predictions average 0.93, indicating minimal information loss through the β-VAE.
Experimental application and evaluation: The trained β-VAE and regression are applied to experimental SPAD data. Performance is benchmarked against the semi-analytic fitting method for: (i) a one-DOF case with a designed phase pattern (apple-shaped θt) and (ii) a full three-DOF case. Errors are quantified in hue (errorH) for the one-DOF case and via an HSB-space distance (errorHSB) with components errorHS (H and S) and errorB (B) for the three-DOF case. Convergence behavior versus the number of time frames is studied and converted to average photons per pixel.
Key Findings
- The β-VAE correctly discovers three meaningful latent variables corresponding to the three Jones matrix DOFs; two additional latents are non-informative.
- The regression network accurately maps latent variables to DOFs with average Pearson correlation ≈0.93 on test data, indicating little information loss in the learned representation.
- One-DOF case (phase-only θt): The DL method yields clearer images with substantially lower error than the fitting baseline when time frames are limited, achieving the target pattern with about 30 time frames (≈14 photons per pixel) at roughly 1.5× the converged error; convergence around 200 frames gives error ≈0.1, outperforming fitting by ~50% in error under low-frame conditions.
- Three-DOF case: The DL approach produces accurate Jones matrix images with markedly less noise and faster convergence than fitting. The overall HSB error (errorHSB) from DL is significantly smaller and converges to within 1.5× of its converged value at ~200 frames, corresponding to ~88 photons per pixel.
- Error decomposition shows DL especially reduces errorHS (H,S) and nearly eliminates errorB (brightness channel related to the third DOF) compared to fitting, indicating robustness in estimating transmission amplitude-related parameters.
- Compared to semi-analytic algorithms derived from g²(0), the DL approach is both more efficient (fewer frames/photons) and more accurate under low-light conditions.
- The approach can assess information sufficiency of the experimental design by the number of meaningful latent variables identified by the β-VAE, guiding experimental upgrades if fewer than three are found.
Discussion
The results demonstrate that a deep-learning strategy directly operating on photon arrival coincidences can recover pixel-wise Jones matrices under low-photon conditions, addressing the core challenge of accurate g² estimation and inversion with limited data. By discovering three disentangled latent variables, the β-VAE confirms that the designed measurements contain sufficient information, and the subsequent regression accurately maps these latents back to physical DOFs. The DL method substantially reduces the number of required photons per pixel while improving image quality and stability relative to the semi-analytic fitting baseline, particularly in the brightness/amplitude-related channel that was a principal source of noise for fitting. This indicates that previous approximations leave performance on the table and that learned models can set an upper bound for attainable performance, informing hardware design (e.g., choice of reference panels, frame budgets) and algorithm development. Beyond quantum imaging, the approach offers a general framework for extracting interpretable physical parameters from high-dimensional, noisy measurements in other imaging modalities.
Conclusion
The paper introduces a DL-assisted coincidence imaging framework for Jones matrix extraction that combines an unsupervised β-VAE for minimal, interpretable representation learning with a lightweight regression network to recover physical DOFs. Trained on simulated photon arrival data, the method discovers the correct number of DOFs, generalizes to variable numbers of time frames, and significantly outperforms a semi-analytic g²-based fitting approach in low-light conditions. Experimentally, it achieves accurate three-DOF imaging with as few as ~88 photons per pixel and one-DOF phase imaging with ~14 photons per pixel. The approach automates algorithm formulation, assesses information sufficiency of measurement designs, and can serve as an upper bound benchmark for analytic methods. Future directions include training directly on experimental datasets to mitigate simulation–experiment mismatches (e.g., dead pixels, setup latencies), extending to other quantum and classical imaging modalities (medical ultrasound, low-light microscopy, depth imaging), and exploring architecture/measurement co-design for further photon-efficiency gains.
Limitations
- The networks are trained on simulated data; discrepancies with real experiments (e.g., dead pixels, non-ideal detector behavior, varying setup times) can introduce domain mismatch. The authors suggest using experimental data for training to improve robustness.
- The approach relies on the specific measurement configuration (reference panels, analyzed polarization, and timing), so sufficiency and performance depend on experimental design; suboptimal designs may yield fewer meaningful latents and require redesign.
- The baseline semi-analytic model involves approximations; while DL outperforms it, a refined analytic model might close part of the gap. The current results thus reflect comparison to an approximate baseline.
- Minor bias in hue for background regions was observed due to the cyclic nature of the phase variable.
Related Publications
Explore these studies to deepen your understanding of the subject.