logo
ResearchBunny Logo
Dynamic machine vision with retinomorphic photomemristor-reservoir computing

Engineering and Technology

Dynamic machine vision with retinomorphic photomemristor-reservoir computing

H. Tan and S. V. Dijken

This groundbreaking research by Hongwei Tan and Sebastiaan van Dijken introduces an innovative dynamic machine vision system that revolutionizes real-time motion recognition and prediction through advanced in-sensor processing, making strides towards enhanced applications in robotics and autonomous driving.

00:00
00:00
Playback language: English
Introduction
Dynamic machine vision (DMV) is critical for applications requiring the recognition of past motion and prediction of future trajectories from present visual data. Current DMV systems typically rely on processing numerous image frames and complex algorithms, leading to high energy consumption and data redundancy. Inspired by the biological vision system's efficient motion processing capabilities, which leverage short-term memory, this research explores the use of retinomorphic image sensors with inherent memory capabilities. Recent advancements in memristors, particularly photomemristors, have shown promise in in-sensor computing and adaptive imaging. However, the application of photomemristors for motion recognition and prediction (MRP) within a compact, dynamic sensing system has remained unrealized. This study aims to address this gap by introducing a retinomorphic photomemristor-reservoir computing (RP-RC) system capable of performing MRP.
Literature Review
The existing literature highlights several approaches to dynamic machine vision, primarily focusing on frame-by-frame analysis using sophisticated algorithms. These methods often suffer from high computational complexity and energy consumption. The biological vision system serves as a strong inspiration, emphasizing the role of short-term memory in efficient motion perception. Research on retinomorphic sensors, such as switchable photovoltaic sensors, non-volatile phototransistors, and memristors, has demonstrated their potential in in-sensor computing and motion detection. In-sensor reservoir computing has also been shown effective for tasks like language learning and image classification. However, a compact dynamic sensing system capable of motion recognition and prediction has been lacking. This research bridges this gap by integrating photomemristors and reservoir computing.
Methodology
The core of the proposed system is a retinomorphic photomemristor array (PMA) functioning as a dynamic vision reservoir. The PMA, with its inherent dynamic memory, stores spatiotemporal information from consecutive frames as hidden states. This information is then processed by readout networks to perform motion recognition and prediction. A 5x5 PMA with an ITO/ZnO/Nb-doped SrTiO3 structure was fabricated using atomic layer deposition (ALD), photolithography, etching, and magnetron sputtering. The photomemristive switching behavior relies on optically and electrically controlled charging and migration of oxygen vacancies, changing the Schottky barrier at the ZnO/NSTO interface. The system's performance was evaluated through several experiments. For language learning demonstration, videos of words ending in 'E' were used as input, with the PMA's memory allowing classification based on the entire spatiotemporal sequence, not just the final frame. For motion recognition and prediction, three frames representing object motion at different speeds were input. A simple readout artificial neural network (ANN) was trained to classify the motion speed based on the hidden states in the final frame. For motion prediction, an autoencoder network was trained to predict future frames based on learned spatiotemporal sequences. Finally, an intelligent traffic simulation demonstrated the system's applicability in a real-world scenario. This involved a PMA with 48x48 photomemristors, a convolutional neural network (CNN) for speed recognition, and a convolutional autoencoder (CAE) for trajectory prediction. The performance of individual components was assessed, and various neural network architectures (ANN, CNN, CAE, DNN) were implemented using TensorFlow.
Key Findings
The RP-RC system demonstrated high accuracy in various tasks. In the word recognition experiment, accuracies of 97.3% and 91.3% were achieved with 15% and 30% Gaussian noise, respectively. The key role of dynamic memory was confirmed by a comparison with a conventional sensing mode, which yielded significantly lower accuracy (36.2%). The motion speed recognition showed 100% and 97% accuracies for 15% and 30% noise, respectively, highlighting the system's ability to leverage the memory imprints of previous frames to distinguish speeds even for a symmetrical object. The autoencoder successfully predicted future motion frames, and the intelligent traffic simulation showcased the system's ability to dynamically adapt decisions based on predicted trajectories, achieving 90% average test accuracy in speed recognition. The crossmodal learning experiment demonstrated successful audio-to-motion prediction with approximately 90% accuracy in recognizing the first frame from audio input, further enhancing the system's versatility. The bias voltage across the Schottky junctions was shown to tune the system's memory capacity and recognition accuracy.
Discussion
The results demonstrate the effectiveness of the RP-RC system in performing complex DMV tasks. The integration of photomemristors and reservoir computing provides a compact, energy-efficient solution for in-sensor processing, eliminating the need for separate sensing, memory, and processing modules. The inherent dynamic memory of the PMA enables the system to capture and utilize spatiotemporal information efficiently. The high accuracy achieved in various experiments showcases the system's potential for practical applications. The tunable memory capacity via bias voltage adjustment adds a layer of adaptability, allowing for optimization for specific tasks and conditions. The success of crossmodal learning further broadens the applicability of the system in multimodal environments.
Conclusion
This research successfully demonstrates a novel retinomorphic photomemristor-reservoir computing system for dynamic machine vision. The system's compact design, high accuracy, and adaptability offer significant advantages over conventional DMV approaches. Future research could focus on expanding the photomemristor's spectral response, exploring more complex network architectures for improved performance, and integrating the system into real-world applications such as autonomous vehicles and robotic systems. Further investigation into more sophisticated crossmodal learning capabilities will also enhance the system's potential in complex dynamic environments.
Limitations
The current system uses a relatively small PMA (5x5 or 48x48). Scaling up the PMA to larger sizes might introduce challenges in uniformity and fabrication. The spectral response of the ZnO-based photomemristor is limited to the UV-blue range. Expanding this range to the visible spectrum would broaden the system's applicability. The training data for the various neural networks were generated by adding noise to experimental data, which might not fully represent real-world conditions. The simulations, while informative, do not perfectly replicate real-world complexities. Finally, the intelligent traffic simulation uses simplified rules and does not consider all aspects of real-world traffic interactions.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny