logo
ResearchBunny Logo
On-chip bacterial foraging training in silicon photonic circuits for projection-enabled nonlinear classification

Engineering and Technology

On-chip bacterial foraging training in silicon photonic circuits for projection-enabled nonlinear classification

G. Cong, N. Yamamoto, et al.

Explore the innovative projection-based classification principle that enables on-chip training of photonic devices for machine learning, demonstrated by authors Guangwei Cong, Noritsugu Yamamoto, Takashi Inoue, Yuriko Maegami, Morifumi Ohno, Shota Kita, Shu Namiki, and Koji Yamada. This research achieves impressive accuracy in various Boolean logics and Iris classification, showing unparalleled scalability without traditional activation functions.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of achieving on-chip training for photonic implementations of machine learning, which have largely focused on inference using offline-trained optical neural networks. Gradient-based backpropagation is difficult to implement directly on photonic hardware due to the lack of efficient optical-to-electrical gradient update mechanisms. While approaches such as in situ backpropagation and calibration-assisted methods have been proposed, experimental on-chip training remains limited. The authors introduce a projection-based classification inspired by SVM, in which input data are nonlinearly mapped via sinusoidal phase-to-amplitude transformations in Mach-Zehnder interferometer (MZI) networks, enabling linear separability in a higher-dimensional complex feature space. They propose and experimentally validate on-chip training using bacterial foraging optimization (BFO), a global optimization algorithm robust to local minima and gradient issues, to realize nonlinear classification without explicit activation functions. The work aims to demonstrate standalone on-chip training and inference for Boolean logics and the Iris dataset, and to compare performance with ANNs.
Literature Review
Prior optical implementations of machine learning include free-space optics, fiber-based systems, and integrated photonics, predominantly performing inference using weights trained offline. In situ optical backpropagation and calibration methods have been proposed for training but involve additional hardware such as embedded monitors and feedback circuits. Experimental on-chip training using global optimization (e.g., genetic algorithms) has been shown. Bacterial foraging optimization has previously been demonstrated for on-chip reconfiguration, failure recovery, and XOR separation in silicon photonics. SVM provides a foundation for nonlinear classification via projection into higher-dimensional feature spaces using kernels; however, explicit hardware implementations of such nonlinear mappings in photonics are sparse, with related ideas in random projection through scattering media and reservoir computing in time-delay systems. The work builds on Clements’ topology for interferometer-based vector-matrix multiplication and contrasts the proposed projection-based photonic classifier (PPC) with optical neural networks (ONNs), emphasizing elimination of activation functions and potential advantages in scalability and energy efficiency.
Methodology
Principle: Inspired by SVM, the authors construct explicit nonlinear mapping functions G(x) using MZI-based silicon photonic circuits. Electrical inputs encode data as optical phases; the optical amplitude space serves as the higher-dimensional complex feature space. Sinusoidal phase-amplitude responses of MZIs implement mappings of the form G = w_j e^{i x_i} + b_j or their quadratic forms (w_j e^{i x_i} + b_j)^2, determined by phase biases. The projected vector G(x) is linearly separated by a vector-matrix multiplication (VMM) implemented as an MZI mesh in Clements’ topology, with optional phase shifters for phase error tuning. The device equation is y = A G(x), trained using mean squared error on detected optical powers with one-hot targets. Device topology: The PPC comprises input MZI stages for constructing G(x), a column of phase shifters for phase tuning/rotation, and a 7-layer (experiment) or 8-layer (simulation for nonlinear datasets) MZI mesh for VMM. All MZIs use 50:50 directional couplers. The fabricated device includes 46 heaters (32 MZIs, 8 phase bias, 6 phase error compensation). Data input schemes are specified for XOR, Iris (four-dimensional), and nonlinear datasets (Circle, Moon, Spiral), each mapping x to an 8-component G(x) via derived equations. Training algorithms: Two on-chip training methods are used: (1) Bacterial foraging optimization (BFO), a stochastic global optimizer emulating chemotaxis and reproduction, robust to local minima and gradient noise; (2) Forward-propagation-based RMSprop (gradient descent) with finite-difference gradient evaluation by small voltage steps. Loss is MSE of normalized output powers y = {p_i/Σ p_i}. BFO parameters (experiment): 10 bacteria, 5 chemotaxis loops, up to 20 swimming steps; adaptive voltage step ΔV = 0.044 + 0.264·L (L is MSE each epoch). RMSprop gradient evaluated with ΔV = 0.088 V (or 0.044 V for comparison), learning rate tuned to 30 (DAC-dependent). Experimental setup: A 220 nm SOI platform was used; silicon wire waveguides (430 nm width), TiN thermo-optic heaters on upper MZI arms, 3 dB directional couplers, AlCu metallization, and inverse tapers for fiber coupling. The device was wire-bonded and packaged; fiber-to-fiber loss ~4.5 dB. A 1.53 µm TE-polarized laser was coupled to the chip; outputs were measured by an 8-channel photodetector. Two 40-channel DC sources drove heaters; control and data acquisition via PC (Python, PyTorch). No special thermal management; room temperature. Data encoding: Inputs are converted to phase in units of π and then to voltages via interpolation from measured power–voltage curves for thermo-optic phase shifters (R·V_on^2 = P + R·V_off^2). For XOR, bit 0 uses learnable V_off; bit 1 uses V_on. For Iris, features are normalized (reported as normalized to 2^7 with x = 2(x − X_min)/(X_max − X_min)) and then converted to voltages by RV(x)^2 = x·P + R·V_off^2. Port assignments encode class labels; e.g., XOR uses ports 1 and 5 for 0 and 1, respectively; Iris uses ports 1, 3, 5 for Setosa, Versicolor, Virginica. Model interpretation: The trained device implements a linear separator in an n-dimensional complex space after nonlinear projection; maximizing output power corresponds to maximizing distance to the separating hyperplane (maximum-margin interpretation). A kernel-matrix perspective is provided by Hermitian inner products H_x = (G(x), G(v)), satisfying positive definiteness. Simulation studies: PPC and ANN were compared on synthetic nonlinear datasets (Circle, Moon, Spiral). PPC used the projection in cascading MZIs (Eqs. 17–24) and an 8-layer MZI mesh (total 43 parameters, no external PS). ANN baselines were multilayer perceptrons with ReLU hidden layers and softmax outputs (2×10×2/3 and 2×50×2/3). Both trained with RMSprop. MNIST scalability simulations examined architectures with quadratic mapping, varying input components and parameter counts (e.g., 815 and 1663 parameters). Performance/engineering analyses: Speed estimated from optical latency (~30 ps) and current apparatus limitations (ms-scale channel setting, USB readout). With integrated high-speed modulators/detectors and >100 MHz drivers, per-loop update ~50 μs; 10^5 loops ~5 s. Power consumption calculated from learned voltage/phase distributions; robustness assessed to control bias errors and directional coupler deviations; stability evaluated by reloading learned settings over one week. Dropout of output ports was used to approximate arbitrary m×n transforms and improve classification contrast.
Key Findings
- On-chip training feasibility: Both BFO and RMSprop successfully trained the PPC device; BFO achieved lower final MSE and showed higher robustness and success rate under experimental conditions (e.g., gradient step sensitivity hindered RMSprop when ΔV was reduced). - XOR logic: Post-training power maps exhibited correct XOR behavior. BFO and RMSprop produced XOR-like patterns; convergence curves showed BFO reaching lower MSE within 50 epochs. The device can be reconfigured to implement single Boolean logics (AND, OR, NAND, XOR) and combinational logics (XOR-AND half-adder, OR-NAND) simultaneously by port assignment and weight reconfiguration. - Iris classification: Using ports 1, 3, 5 for labels (Setosa, Versicolor, Virginica), BFO achieved 94.44% training and 96.67% test accuracy; with output dropout (three ports), accuracies reached 98.89% (train) and 98.33% (test), ~98.67% overall. RMSprop yielded similar accuracies but slightly higher MSE. Results are comparable to a 4×5×3 ANN with ReLU/softmax (max 96.67% verification). - Nonlinear datasets (simulation): PPC (43 parameters, no activation functions) achieved 100% on Circle and Moon and ~90% on 3-class Spiral, comparable to an ANN with 50 hidden nodes (252/303 parameters). PPC converged faster than ANN on Moon and Spiral; comparable speed on Circle. - MNIST scalability (simulation): With quadratic projection, ~90–91% test accuracy was achieved with 815 parameters and 16 input components; ~94% with 1663 parameters and 32 inputs. Projection contributed ~4–5% accuracy improvement over linear mapping. Accuracy scaled roughly with log(N^0.13) (train) and log(N^0.09) (test) versus parameter count N. - Speed and latency: Optical path latency ~30 ps; overall throughput currently limited by laboratory instrumentation. With integrated high-speed drivers (>100 MHz), per-loop updates ~50 μs imply ~5 s for 10^5 loops. - Power: Experimental total heater power ~364.5 mW (BFO) and ~358.7 mW (RMSprop). Scaling to 1663 thermo-optic parameters (Pπ ~15 mW) implies ~5 W; using carrier-injection phase shifters (2–3 mW Pπ) could reduce to ~1 W; PCM/MEMS options offer low-loss/powerless standby. - Robustness and stability: Accuracy remained nearly unchanged for <3% random bias errors on learned voltages; accuracy degraded beyond 3%. With dropout, bias tolerance tightened (~>1% caused noticeable degradation). Stability over one week showed ±1% accuracy and ±0.02 MSE variations; reproducible results across two chips. Classification performance was resilient to directional coupler deviations up to δ ~10%.
Discussion
The results demonstrate that explicit nonlinear projection implemented via MZI-induced sinusoidal mappings, followed by linear interferometric VMM, enables nonlinear classification on silicon photonic hardware without activation functions. On-chip training using BFO effectively optimizes device phases, circumventing gradient-vanishing/exploding and local-minima issues inherent to gradient-based methods in noisy experimental settings. The PPC achieves accuracies comparable to ANNs on benchmarks (XOR, Iris, Circle, Moon, Spiral), while using far fewer parameters and offering faster convergence in several cases, highlighting scalability advantages. The approach supports multiple logic functions simultaneously and general-purpose classification within the same device through reconfiguration, with sub-Watt power in experiments and projected W-levels at larger scales that remain below typical CPU/GPU consumption for similar tasks. Analyses of robustness to control errors and fabrication imperfections, and week-long stability, suggest practical viability. Compared with ONNs, the PPC avoids the need for on-chip nonlinear activation elements or OEO inter-layer conversions, simplifying integration while retaining high-speed, low-latency operation. Overall, the findings validate an SVM-like projection principle realized physically in photonics and show that natural-intelligence-inspired training can make photonic classifiers standalone and reconfigurable.
Conclusion
The paper introduces and experimentally validates a projection-based photonic classifier (PPC) that realizes SVM-like nonlinear mapping using MZI networks on silicon photonics, trained on-chip via bacterial foraging optimization. The device performs single and combinational Boolean logics and classifies the Iris dataset with up to ~98.3% test accuracy using output dropout, comparable to ANN baselines, and achieves strong performance on synthetic nonlinear datasets with far fewer parameters and without activation functions. The methodology provides advantages in scalability, energy efficiency, and speed, and reduces system complexity by eliminating the need for optical nonlinear activations and OEO conversions. Future directions include integrating high-speed modulators and photodetectors with on-chip drivers for faster training and inference, adopting more energy-efficient and faster phase shifters (carrier-injection, PCM, MEMS), scaling architectures to larger datasets (e.g., MNIST-class and beyond), refining training algorithms for noisy hardware, and exploring broader classes of kernel-inspired projections within photonic circuits.
Limitations
- Experimental throughput was limited by laboratory current sources (2–3 ms per channel update) and USB readout, restricting dataset size and training speed; thermo-optic heaters also impose microsecond-scale intrinsic response, though integrable driver solutions exist. - Thermo-optic phase shifters consume static power; scaling to thousands of parameters may reach multi-Watt levels unless more efficient shifters (carrier-injection, PCM, MEMS) are used. - Classification accuracy is sensitive to control bias errors; tolerances are ~<3% without dropout and ~<1% with dropout before noticeable degradation. - RMSprop-based on-chip gradient estimation is sensitive to voltage step size and noise, sometimes failing to converge under reduced steps; BFO, while robust, may converge more slowly initially. - Demonstrations used modest-size datasets (e.g., Iris) and port assignments were manually configured; fully integrated high-speed IO and on-chip electronics were not included in the prototype. - The PPC relies on precise phase control and calibration of phase errors, though robustness to directional coupler deviations was observed; environmental fluctuations (e.g., polarization) can cause small variations over time.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny