Introduction
Implementing machine learning algorithms in optical systems is attracting significant interest due to the potential for improved energy efficiency and speed compared to electronic counterparts. Artificial neural networks (ANNs) are a popular choice, but most photonic implementations focus solely on inference using pre-trained models, limiting their standalone capabilities. On-chip training in photonics remains challenging because gradient-based backpropagation, commonly used for ANN training, is difficult to adapt to photonic devices. While theoretical approaches exist, they often require additional complex components. Global optimization algorithms and forward propagation methods offer alternatives, but experimental demonstrations are limited. This work explores an alternative approach inspired by Support Vector Machines (SVMs), focusing on constructing nonlinear mapping functions in silicon photonic circuits to enable projection-based classification. The authors aim to demonstrate on-chip bacterial foraging training for this method, achieving standalone training and inference without pre-calibration or reliance on external computer training.
Literature Review
Existing research in optical and photonic machine learning has explored various approaches, including free-space optics, fiber-based systems, and integrated photonics. These implementations often utilize artificial neural networks, demonstrating inference capabilities through vector-matrix multiplication, random projection, and Fourier transformation. However, most implementations rely on pre-trained models, limiting their on-chip training and real-time reconfiguration capabilities. Recent work has explored on-chip training using genetic algorithms, but the search for efficient and robust training methods remains ongoing. While Support Vector Machines (SVMs) are widely used in machine learning, their application in photonic platforms is relatively unexplored. Existing methods like random projection and reservoir computing utilize implicit projection, while this paper aims to explicitly implement nonlinear projection within a photonic system.
Methodology
This research proposes a projection-based photonic classifier (PPC) architecture consisting of Mach-Zehnder interferometer (MZI) networks and phase shifters (PS). The nonlinear mapping function G(x) is created by arranging data input schemes in the MZI network, projecting data from the input phase space (electrical domain) to the amplitude space (optical domain). A vector-matrix multiplication (VMM) module, implemented using a Clements' topology interferometer circuit, then performs linear separation in the projected space. The paper outlines the design and equations for implementing these mapping functions for various datasets: XOR, Iris, and nonlinear datasets (Circle, Moon, Spiral). The training employs a mean squared error (MSE) loss function, minimizing the difference between the output vector and the one-hot encoded target vector. The authors use both bacterial foraging optimization (BFO) and RMSprop algorithms for on-chip training and compare their performance. The experimental setup includes a fabricated silicon photonic chip with fiber array coupling, enabling direct on-chip training and inference. Data input is achieved by converting normalized data into voltage parameters through interpolation of a power-voltage curve, controlling the phase of the MZIs. The experimental validation is performed on several tasks, including XOR logic, various Boolean logic operations (AND, OR, NAND), and the Iris dataset classification. Simulations are conducted to compare the PPC's performance against artificial neural networks (ANNs) on nonlinear datasets.
Key Findings
The experimental results demonstrate successful on-chip training using both BFO and RMSprop, achieving XOR separation with similar MSE errors in the early stages of training. However, BFO shows better convergence and robustness, demonstrating success across various parameter conditions. The PPC successfully performed single Boolean logics (XOR, AND, OR, NAND) and combinational Boolean logics (XOR-AND, OR-NAND) in a single device. The Iris classification task achieved high accuracy (94.44% training, 96.67% testing), even reaching 98.33% accuracy when employing a dropout technique. Simulations comparing the PPC with ANNs on nonlinear datasets (Circle, Moon, Spiral) show that the PPC achieves comparable or better accuracy with significantly fewer parameters (43 vs. 252-303) and without employing nonlinear activation functions. The scalability of the PPC is investigated through MNIST digit classification simulations, achieving an accuracy of ~94% with approximately 1663 parameters. The power consumption of the PPC is significantly lower than that of conventional CPUs or GPUs, demonstrating high energy efficiency.
Discussion
The findings demonstrate that the proposed projection-based classification approach in silicon photonic circuits is a viable and efficient method for implementing machine learning algorithms on-chip. The use of BFO as a robust training algorithm eliminates the challenges associated with gradient-based methods and allows for standalone training and inference. The high accuracy achieved on complex classification tasks, such as Iris classification and MNIST simulation, suggests the scalability and potential of this approach. The significantly smaller number of parameters required compared to ANNs indicates improved efficiency and reduced hardware complexity. The experimental validation shows the robustness and reproducibility of the approach, despite potential fabrication imperfections and environmental fluctuations. This work suggests that photonic machine learning could be a practical alternative for specific applications, offering advantages in speed, energy efficiency, and compactness.
Conclusion
This research successfully demonstrates on-chip training and inference of a projection-based photonic classifier for nonlinear classification tasks. The use of bacterial foraging optimization enables robust and efficient training, leading to high accuracy results for various benchmarks, including Iris classification and Boolean logic operations. The significantly smaller scale compared to ANNs, combined with its energy efficiency, makes this approach attractive for future photonic machine learning systems. Future work could explore integration of high-speed modulators and photodetectors to enhance speed and data throughput, and investigate the application of this approach to even larger and more complex datasets.
Limitations
The current experimental setup is limited by the speed of the 40-channel current sources used for controlling the phase shifters, affecting the overall training time. The accuracy is sensitive to bias errors in the control of the voltage weights, though it maintains robustness within a certain error tolerance. While the study demonstrates the feasibility and advantages of the approach, the generalization to much larger datasets requires further investigation and optimization. The specific choice of the MZI configuration might influence the performance and scalability.
Related Publications
Explore these studies to deepen your understanding of the subject.