The limitations of CMOS technology in handling the increasing demands of artificial intelligence (AI) and machine learning (ML) algorithms, particularly at the edge, are driving the search for alternative architectures. Traditional von Neumann architectures suffer from communication bottlenecks and memory limitations. While analog CMOS designs offer improvements, they are susceptible to device mismatch and non-linear effects. Emerging memory devices, particularly those based on two-dimensional (2D) materials, offer the potential to overcome these limitations through in-situ processing. 2D materials-based resistive RAM (RRAM) exhibits resistive switching (RS), enabling data processing within the memory. This paper proposes a hybrid architecture integrating CMOS and 2D material-based memristors using the ELM algorithm to address the challenges of edge computing, aiming for improved performance, energy efficiency, and reduced latency.
Literature Review
The authors review existing literature on the limitations of CMOS for AI/ML applications, particularly regarding power consumption and memory bottlenecks. They discuss analog CMOS design techniques and their challenges, including sensitivity to device mismatch. The limitations of traditional memory technologies (RAM and Flash) are highlighted, emphasizing the need for emerging memory technologies. Research into 2D material-based RRAM is discussed, focusing on its potential advantages over traditional TMO-based RRAM, such as overcoming vertical scaling limitations and exhibiting faster operating speeds and lower power consumption. The suitability of the Extreme Learning Machine (ELM) algorithm and its local receptive field variant (LRF-ELM) for hardware implementation is implicitly justified by referencing prior work demonstrating its advantages in terms of efficiency.
Methodology
The proposed hybrid architecture comprises a CMOS encoder chip and a memristor decoder chip. The CMOS encoder utilizes the LRF-ELM algorithm, a variant of ELM where weights between input and hidden layers are local and random, with only output weights being trainable. The encoder unit includes an ELM encoder, a row-select encode unit, a bias generator, and a control unit. The Gaussian kernel function within the LRF-ELM is implemented using subthreshold MOS circuits, leveraging inherent device mismatch for random weight generation. The decoder chip features a memristor crossbar array (fabricated using CVD multilayer h-BN as the resistive switching medium) for implementing the output weights, a row-select decode unit, and a mixed-signal interface unit with differential pair integrators (DPI). The weights are programmed into the memristor crossbar array as quantized conductance states. A modified quantization-aware stochastic gradient descent (SGD) algorithm is used for training, accounting for device-to-device variability. The system's operation involves time-multiplexing the output of the hidden nodes to the memristor crossbar array, performing multiply-accumulate (MAC) operations, and using DPI circuits for output integration and classification. The fabrication process of the h-BN memristor crossbar array is detailed, along with device characterization methods. Post-layout CMOS circuit simulations were performed using Cadence IC 6.18 with Spectre 18.
Key Findings
The fabricated h-BN memristor crossbar array exhibits analog bipolar resistive switching in >96% of devices, showcasing more than 26 stable conductance states. The coefficient of variation (Cv) for set and reset voltages is low (6.2% for VSET and 12.4% for VRESET). The hybrid LRF-ELM system demonstrates high accuracy in classification tasks for various datasets: High accuracy was achieved in audio classification using the ESC-50 and FSDD datasets (Table 1). Accuracy in image classification using the Semeion digit recognition dataset is also reported (Table 2). Results for lower-dimensional datasets (ARM, Breast Cancer, Ionosphere, Haberman) using a standard ELM network are presented in Table 3. The CMOS power consumption per computation for a single hidden node is 7.8 μW at 0.95V supply voltage.
Discussion
The results demonstrate the feasibility and effectiveness of the proposed hybrid architecture for edge computing applications. The system successfully leverages the advantages of both CMOS and 2D memristor technologies. The use of inherent device mismatch for random weight generation simplifies the architecture and reduces energy consumption. The high classification accuracy across diverse datasets, including high-dimensional audio and image data, highlights the versatility and robustness of the approach. The modular design allows for easy replacement of the decoder chip with other emerging memory technologies. The low power consumption makes the system suitable for resource-constrained edge devices.
Conclusion
This research successfully designed and demonstrated a hybrid edge computing system combining CMOS and a 2D h-BN memristor crossbar array. The system achieves high classification accuracy across diverse datasets while exhibiting low power consumption. The modular design allows for adaptability with other emerging memory technologies. Future work could explore further miniaturization of the memristor devices and investigation into different 2D materials and algorithms.
Limitations
The current implementation relies on off-chip training of the LRF-ELM network. While the system demonstrates high accuracy with quantized weights, there is a slight performance reduction compared to floating-point precision. The study focused on specific datasets; further evaluation across a broader range of datasets is needed to fully assess generalizability. The energy efficiency analysis only considers CMOS power consumption and does not encompass the energy consumption of the memristor array.
Related Publications
Explore these studies to deepen your understanding of the subject.