Engineering and Technology

An optical neural network using less than 1 photon per multiplication

T. Wang, S. Ma, et al.

Discover groundbreaking research by Tianyu Wang, Shi-Yuan Ma, Logan G. Wright, Tatsuhiro Onodera, Brian C. Richard, and Peter L. McMahon, as they unveil an optical neural network that achieves 99% accuracy in handwritten-digit classification with minimal optical energy. This study showcases the remarkable potential for optical neural networks to deliver high accuracy with extremely low photon usage.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the escalating energy consumption associated with deep neural network (DNN) inference on conventional digital processors. As 80–90% of the cost of large-scale DNN deployments is due to inference, there is strong motivation for specialized, energy-efficient hardware. Optical processors have been proposed to deliver higher energy efficiency and lower latency, particularly for matrix–vector multiplications, the dominant computational workload in DNNs. Theory predicts that optical matrix–vector multiplication can, for sufficiently large vector sizes and at the shot-noise limit, consume less than one photon of optical energy per scalar multiplication, implying orders-of-magnitude advantage over electronic digital multipliers. The purpose of this work is to experimentally validate optical neural network (ONN) operation in the sub-photon-per-multiplication regime and to quantify the achievable accuracy and energy, thereby testing the hypothesis that ONNs can approach shot-noise-limited performance with extremely low optical energies.

Literature Review

The paper surveys proposals and prior demonstrations of optical processors for deep learning accelerators, including implementations using wavelength and spatial multiplexing in photonic integrated circuits and 3D free-space systems. Prior theoretical and simulation studies suggest ONNs based on optical matrix–vector multipliers can surpass energy limits of irreversible digital computing, potentially achieving sub-photon-per-multiplication energy costs at the standard quantum (shot-noise) limit. Various architectures compute matrix–vector operations via parallel vector–vector products with concatenated outputs. Despite rapid progress, many ONN systems have not fully exploited operation near the shot-noise limit. The literature indicates DNNs can be trained for resilience to noise (including photon shot noise), motivating experimental exploration of low-photon-count regimes.

Methodology

The authors built a free-space optical dot-product engine to study large-element ONNs in the low-photon regime. The system computes vector–vector dot products y_i = x · w_i via two steps: (1) element-wise multiplication and (2) optical fan-in. Input vector elements x_j are encoded as intensities of individual spatial modes from an OLED source; corresponding weights w_ij are encoded as transmissivities of spatial light modulator (SLM) pixels. Each OLED pixel is imaged to an SLM pixel to perform scalar multiplication w_ij x_j. The modulated light from a block of pixels is then focused onto a single photodetector, implementing an incoherent sum proportional to the dot product result y_i. Because elements are encoded in intensity, the system natively supports non-negative vectors; negative elements are handled via a conversion procedure (Supplementary Note 11). The apparatus aligns 711×711 pixels on the OLED to 711×711 SLM pixels, enabling up to 711×711 = 505,521 scalar multiplications and additions in parallel and computing single dot products with vector sizes up to ~0.5 million. The SNR at the detector scales as N at the shot-noise limit due to summing N incoherent spatial modes, allowing accurate readout even when each mode carries much less than one photon on average. Dot-product accuracy was characterized using random vector pairs. The optical signal encoding the dot-product solution was measured on a sensitive photodetector; the number of photons per multiplication was controlled via detector integration time and neutral-density filters. Photons per multiplication were computed as total measured optical energy divided by the number of scalar multiplications. RMS error was calculated by comparing measured dot products to digital ground truth over repeated trials. For ONN inference, a 4-layer fully connected MLP for MNIST (784-100-100-10 with ReLU) was trained in PyTorch using quantization-aware training (QAT), quantizing activations to 4 bits and weights to 5 bits, with data augmentation to improve robustness. At inference, each layer’s matrix–vector multiplication was executed optically; biases and nonlinearities were applied digitally between layers. Calibration mapped detector photon counts to dot-product values using initial MNIST samples, and the number of photons per multiplication was set by the detector’s integration time. The detector operated as a single-pixel readout for optical fan-in. Experimental accuracies were obtained in single-shot execution without repetition.

Key Findings

- Large-scale optical dot products: The setup aligned 711×711 OLED and SLM pixels to perform up to 505,521 scalar multiplications in parallel and compute dot products for vectors of size ~0.5 million. - Shot-noise scaling: Incoherent optical fan-in yields detector SNR ∝ N at the shot-noise limit, enabling accurate dot products even when average photons per multiplication are far below 1. - Dot-product precision vs photon budget: For N ≈ 0.5 million, with ~0.001 photons per scalar multiplication, RMS error was ~6%. Error decreased with photon count and reached ~0.2% RMS at ≥2 photons per multiplication (corresponding to ~9 noise-equivalent bits); at low photon counts, performance was dominated by photon shot noise. - Vector size effect: For a given photon budget (0.001 to 10 photons per multiplication), larger vector sizes produced lower variance, consistent with increased averaging in dot-product sums. - MNIST classification with an ONN: A 4-layer MLP achieved ~99% accuracy at ~3 photons per multiplication, matching the accuracy of the same model run digitally without noise. In the sub-photon regime, ~0.6 photons per multiplication yielded ~90% accuracy. Experimental results agreed with digital simulations that included only photon shot noise, indicating shot-noise-limited performance at low photon budgets. - Energy per inference: To achieve ~99% accuracy, accounting for SLM transmission (~46%), the total optical energy required for all matrix–vector multiplications per inference was ~230 fJ. The model required 90,384 scalar multiplications per inference. - Sub-photon interpretation: Due to optical fan-in measuring only the total photon number across modes, many individual modes may contribute zero detected photons while the summed signal remains precise, enabling sub-photon-per-multiplication operation.

Discussion

The experiments validate that optical neural networks can operate accurately in regimes where photon shot noise sets the fundamental limit, achieving high accuracy with less than one detected photon per scalar multiplication. The key enabling principle is optical fan-in, which incoherently sums many low-photon-count spatial modes, yielding detector SNR that scales with the number of terms in the dot product. This accumulation reduces effective noise and permits precise readout even when individual multiplications are extremely photon-starved. The close agreement between experimental accuracy and simulations with only shot noise indicates that, at low photon budgets, performance is primarily limited by the standard quantum limit of detection rather than technical noise. These findings directly address the research goal by demonstrating an ONN that performs useful inference tasks with extremely low optical energy, supporting the prospect of significant energy advantages over electronic implementations on a per-operation and per-inference basis. Additionally, the measured energy per inference indicates that optical energy could constitute a small fraction of total system energy in practical ONN designs, with potential for orders-of-magnitude improvements as hardware is optimized.

Conclusion

This work experimentally demonstrates an optical neural network that performs matrix–vector multiplications with less than one detected photon per scalar multiplication while maintaining useful accuracy, including ~99% accuracy on MNIST at ~3 photons per multiplication and ~90% at ~0.6 photons. The study establishes shot-noise-limited operation and highlights the advantage of optical fan-in for reducing effective noise in large dot products. The apparatus achieved ~230 fJ of optical energy per inference for the full set of matrix–vector multiplications in the MNIST MLP. Future research directions include developing dedicated matrix–vector multipliers with high optical efficiency and large fan-in, integrating fast and efficient modulators, translating the free-space 2D-block approach to scalable integrated-photonics platforms, increasing data throughput, and extending applications to other machine-learning algorithms and combinatorial-optimization heuristics. System-level studies suggest the optical energy could be a small fraction of total energy, motivating co-design of optics and electronics for whole-system efficiency gains.

Limitations

- The experimental apparatus was not optimized for speed or electrical energy consumption; throughput was limited by input hardware refresh rates (OLED and SLM at ~10 Hz), though the detector itself supported ~100 ns integration times. - The architecture natively supports non-negative vector elements; handling negative values requires additional encoding steps. - At higher photon counts, accuracy was limited by imperfect imaging/alignment between OLED and SLM pixels rather than shot noise. - Experimental MNIST accuracy was reported on the first 100 test images, limiting statistical certainty relative to full test sets. - The 2D-block free-space architecture is not the most suitable for near-term integrated-photonics incorporation; it serves primarily as a platform to study energy limits rather than as a practical accelerator. - Calibration procedures (mapping photon counts to dot products) introduce potential systematic errors and may not generalize across configurations without re-calibration.