Introduction
Silent speech interfaces (SSIs) are crucial for scenarios where verbal communication is difficult, such as noisy environments or situations involving individuals with speech impairments (stroke, cerebral palsy, Parkinson's disease, post-laryngeal surgery). SSIs decode speech from non-vocal signals. A key challenge is developing comfortable, durable, and accurate wearable systems for real-world use. This requires a device that is comfortable and durable enough for practical use, operates with high precision across various scenarios and users, and efficiently distinguishes speech. Recent research focuses on improving signal capture devices and algorithmic models. Human speech involves neural impulses from the central nervous system, traveling to the vocal cords and influencing facial movements. This study aims to address the limitations of current SSI technology by developing a highly sensitive and efficient system.
Literature Review
Existing SSI systems based on electromyography (EMG) sensors or strain sensors often struggle with noise (flicker noise from sensor imperfections, environmental sound noise, and physiological noise from body movements). Previous approaches often convert 1D time-series signals into 2D images using feature extraction (like Fourier Transform) before using 2D neural networks. This increases computational complexity, making them unsuitable for wearable devices. While multi-channel arrays benefit from 2D algorithms for spatial resolution, single-channel devices with low sensitivity require 2D methods to enhance feature extraction. This research addresses this limitation by developing a highly sensitive sensor that produces high-information-density signals, allowing for the use of efficient 1D methods.
Methodology
This research developed an ultrasensitive textile strain sensor with ordered cracks in a structured graphene layer on a textile substrate (95% bamboo fibers, 5% elastane). The ordered cracks, created through a one-step printing process and prestretching, significantly enhance sensitivity (gauge factor of 317 with <5% strain, a 420% improvement over existing technologies). Graphene nanoplatelets, advantageous for piezoresistivity, were used in a DI-water-based ink prepared via high-pressure homogenization (HPH). Screen printing provided customizable patterns, compatibility with the flexible substrate, and scalability. The sensor's performance was evaluated by monitoring relative resistance changes, showing a linear response with low hysteresis and high reliability within a small strain range. The sensor also demonstrated resistance to tensile frequency interference and an ultralow detection limit (0.05%). Durability testing showed it can withstand over 10,000 stretching-releasing cycles. For speech recognition, a lightweight end-to-end 1D convolutional neural network was developed. This model incorporates residual blocks, batch normalization, ReLU activation, and max-pooling for efficient feature extraction and downsampling. To improve noise immunity, a 'random noise window' data augmentation technique was used instead of filtering, enhancing energy efficiency.
Key Findings
The ultrasensitive textile strain sensor achieved a gauge factor of 317 within 5% strain, a significant improvement over existing textile strain sensors. The 1D convolutional neural network reduced the computational load by 90% while maintaining a remarkable 95.25% accuracy in speech decoding for a set of 20 frequently used English words. The system showed high accuracy (93%) in distinguishing between easily confusable word pairs differing by a single phonetic element and 96% accuracy for longer words at varying speeds. The model's robustness was demonstrated through its response to environmental sound noise (unresponsive to 100 dB sound) and its ability to handle the variation in choker tightness and placement. Relevance-Class Activation Mapping (R-CAM) showed the model focuses on key micromovements during classification. The model's generalizability was demonstrated by its ability to achieve 80% accuracy on new users and words with minimal fine-tuning and 90% accuracy with more fine-tuning, showcasing its potential for practical applications.
Discussion
This research successfully addressed the challenges of creating a practical and efficient silent speech interface. The high sensitivity of the textile strain sensor, combined with the computationally efficient 1D neural network, allows for accurate and fast speech decoding in real-world conditions. The system's robustness to noise and variations in wear demonstrates its potential for widespread use. The findings significantly advance the field of SSI technology, paving the way for seamless and natural silent communication in various settings. The model's ability to generalize to new users and words with minimal fine-tuning highlights its adaptability and potential for personalization.
Conclusion
This study presents a novel silent speech interface based on an ultrasensitive textile strain sensor and a lightweight 1D convolutional neural network. The system demonstrates high accuracy, computational efficiency, and robustness in real-world scenarios. Future work could focus on expanding the vocabulary, improving the system's performance in diverse acoustic environments, and investigating its applicability to other forms of silent communication.
Limitations
The study primarily focused on a limited English vocabulary. While the model demonstrated good generalizability, further testing with a larger and more diverse vocabulary and user population is needed. The current system's performance under extreme physical activity or in highly variable environmental conditions could be further investigated.
Related Publications
Explore these studies to deepen your understanding of the subject.