logo
ResearchBunny Logo
All-weather, natural silent speech recognition via machine-learning-assisted tattoo-like electronics

Engineering and Technology

All-weather, natural silent speech recognition via machine-learning-assisted tattoo-like electronics

Y. Wang, T. Tang, et al.

Explore the groundbreaking silent speech recognition system developed by Youhua Wang, Tianyi Tang, Yin Xu, Yunzhao Bai, Lang Yin, Guang Li, Hongmiao Zhang, Huicong Liu, and YongAn Huang. Using innovative tattoo-like electrodes for signal capture and advanced machine learning for recognition, this system achieves an impressive 92.64% accuracy in real-world settings, even amidst noise and darkness. Join the future of communication today!... show more
Introduction

The study addresses the need for reliable, natural silent speech interfaces for people with aphasia and for robust human–machine interactions in all-weather conditions (obscured, dynamic, quiet, dark, and noisy). Silent speech using facial surface electromyography (sEMG) promises high information throughput with minimal specialized training compared to alternatives such as typing or sign language. However, challenges include the spatiotemporal variability of sEMG signals, the need for high-fidelity acquisition, accurate classification, and comfortable, imperceptible wearables. Existing rigid or gel electrodes often fail to conform to facial skin, degrading signal quality and comfort, particularly under large facial deformations (~45%). This work proposes a strategy integrating ultrathin, tattoo-like epidermal electrodes, a lightweight wireless data acquisition module, and a cloud-deployed machine-learning algorithm to achieve accurate, natural silent speech recognition in real-world scenarios.

Literature Review

Silent speech recognition using EMG dates back to the mid-1980s, with early demonstrations classifying a handful of vowels or words. Over the past two decades, vocabularies and accuracies have steadily improved: six words at ~92% (2003), 60 words at 87.07% with HMMs (2008), >1200 phrases from a 2200-word vocabulary at 91.1% (2018), and recent deep-learning approaches achieving ~90% for 10 words (2020). Prior systems typically relied on non-flexible electrodes and high-rate, high-precision laboratory equipment, with limited evaluation of long-term performance and real-world applicability. Deep learning requires large labeled datasets, which is burdensome for users; classical machine learning often performs better with small samples and offers faster processing suitable for real-time use. Flexible, skin-like tattoo electronics offer improved conformability and comfort, potentially enhancing sEMG acquisition quality. Few studies have applied tattoo-like electrodes to silent speech; previous attempts recorded only several words, limiting practical deployment. Cloud-based algorithms can reduce complexity on the wearable side, an important trend for practical systems.

Methodology

System design: The SSRS integrates four-channel tattoo-like epidermal electrodes, an ear-mounted wireless data acquisition (DAQ) module, a cloud-based machine-learning algorithm, and a mobile terminal for display/audio feedback. Electrodes: Four bipolar channels target muscles with strong activity during speech: levator anguli oris (LAO), depressor anguli oris (DAO), buccinators (BUC), and anterior belly of digastric (ABD). Each channel has a working and a reference electrode; the posterior mastoid serves as the reference site. Electrodes are 1.2 μm thick, integrated into a 3M Tegaderm patch (~47 μm, ~7 kPa), patterned as filamentary serpentines (width 500 μm, width-to-arc-radius ratio 0.32, arc angle 20°) for stretchability. Overall electrode size ~18 mm × 32 mm. Fabrication uses a cut-and-paste method: deposit 10 nm Cr/100 nm Au on 1.1 μm PET; pattern with a programmable cutter; transfer via thermally released tape onto Tegaderm. Connection to wires is achieved via low-temperature alloy welding and reinforced with Tegaderm. Wireless DAQ: Each of four channels is amplified (AD8220 instrumentation amplifier and OPA171) to 1000×, band-pass filtered 10–500 Hz, digitized by a 10-bit ADC (Atmega328p) at 500 Hz, and transmitted via Bluetooth 5.0 (CC2540F256) at up to 256 kb/s. The mobile terminal receives and displays recognition results via Bluetooth. Signal acquisition protocol: After cleaning skin, two electrodes per muscle are placed 2 cm apart over LAO, DAO, BUC, ABD; reference at posterior mastoid. Subjects silently read each of 110 words ten times while avoiding non-speech facial movements. Preprocessing and feature extraction: Active segment interception detects silent speech-related sEMG using thresholds: absolute amplitude ≥50 μV and at least two muscles activated. Segments include 800 ms before and 1200 ms after threshold crossing. Baseline wandering and artifacts are reduced using a 4-level wavelet packet with soft thresholding; reconstruction uses nodes 2–16 of level 4 (effectively a ≥15 Hz high-pass for EOG suppression). From the denoised signals, 15 relative wavelet packet energy features (frequency-domain) and 10 time-domain features are extracted per channel after full-wave rectification (feature definitions per Supplementary Table 3). With four channels, each word yields a 100-dimensional feature vector. Model training and inference: Users provide training data for 110 words × 10 repetitions (1100 samples). A 1100×101 feature matrix (100 features + 1 label) is used to train a linear discriminant analysis (LDA) classifier with one-vs-rest strategy and tenfold cross-validation. Online prediction computes the 100-D feature vector for each detected active segment and infers the word via the trained LDA model. The system runs on Windows 10 using MATLAB 2019b; real-time window length 2000 ms with 200 ms sliding window; sampling 500 Hz. Mechanical/electrical characterization: Conformability and stretchability assessed on skin and silicone (E0.35 MPa) substrates; FEA strain maps under 45% applied strain show maximum principal strains of 4.1% (horizontal) and 1.8% (vertical) in serpentines. Electrical resistivity of Au/Cr/PET composite increases ~5% at 2% strain and ~30% at 4% strain; design prioritizes electrical integrity under facial deformations ("electricity-preferred" method). Long-term wear tests (10 h) compare noise (SD) and impedance versus commercial gel electrodes (3M), including effects of running and dining, and room temperature variations.

Key Findings
  • Vocabulary and accuracy: The SSRS recognizes 110 common daily words (13 categories) with an average offline LDA accuracy of 92.64% (confusion matrix shown). LDA outperforms SVM and Naive Bayes in accuracy, F1-score, training speed, and prediction speed for few-shot (10 samples/word) classification.
  • Channel selection and robustness: Optimal four-channel set LAO, DAO, BUC, ABD achieves ~92.1% mean accuracy in channel scaling tests. With channel loss, accuracy remains >85% with 3 channels, >70% with 2 channels, and 42.27% with 1 channel (vs 0.91% random chance).
  • Mechanical/electrical performance: Tattoo-like electrodes conform to facial skin under large deformations (opening mouth, inflating cheeks, lateral twitches) and maintain intact interfaces under 30% stretch on skin-mimic silicone. FEA under 45% applied strain yields max principal strains 4.1% (horizontal) and 1.8% (vertical). Electrical resistivity change in Au/Cr/PET composite is ~5% at 2% strain and ~30% at 4% strain; overall electrode resistance minimally impacted due to serpentine parallel path design.
  • Long-term wear: Over 10 h, gel electrodes maintain low impedance/noise due to chloride ions, while tattoo electrodes' impedance and noise decrease over time (sweat accumulation under Tegaderm). After running, tattoo electrodes show noise lower than gel electrodes. LOG feature stability indicates immunity to room temperature changes (20–30.1–19.8 °C), running, and dining; repeated phrase recognition (“Hello, nice to meet you”) maintains ~90% at different times with ~95% average.
  • Real-world scenarios: • Greeting: 8-word set recognized at 95% accuracy; real-time multi-channel sEMG patterns are distinctive. • Exercise: For five location-related words, recognition rates are ≥96% at rest (0 m/s), walking (1 m/s), jogging (3 m/s), and 86% at running (5 m/s); overall average 96% across states; background noise remains low. • Repast (mouth deformation/fatigue): With 0–200 repeated mouth movements and four deformation types per repetition, five food-related words maintain ≥96% recognition; overall 98% across repeat counts. • Noisy environment: For four color words, at 80 dB ambient noise, ASR accuracy drops to ~20% while SSRS remains at 100%; sEMG signals are unaffected by acoustic noise. • Darkness: SSRS reliably recognizes emotion-related words (“Happy,” “Sad,” “Sorry,” “Angry,” “Love”) as illumination decreases, while ASL recognition fails in darkness.
  • Artifact handling: EOG contamination (0–12 Hz, max amplitude ~30 μV) is mitigated by ≥15 Hz high-pass effect of wavelet reconstruction; post-processing reduces EOG to <12 μV, below activation threshold (50 μV), yielding negligible impact.
Discussion

The integration of ultrathin, conformal tattoo-like electrodes with a lightweight wireless DAQ and cloud-based LDA enables accurate, natural, and robust silent speech recognition suitable for daily-life use. High-fidelity sEMG acquisition from four targeted facial/neck muscles provides sufficient discriminative information to classify a 110-word vocabulary with high accuracy using small-sample machine learning. The system’s mechanical compliance and stable skin–electrode interface support long-term, comfortable wear and reduce motion artifacts, addressing a key barrier to real-world deployment. Demonstrations across dynamic movement, significant mouth deformations, noisy industrial settings, and darkness show that the SSRS maintains high recognition performance where voice- or vision-based interfaces fail, directly addressing the goal of all-weather human–machine and human–human communication. Robustness to channel loss and low computational footprint further enhance practical reliability and portability. Collectively, the findings indicate strong potential for assistive communication (e.g., aphasia), quiet environments, and resilient human–machine interfaces in diverse conditions.

Conclusion

This work presents an all-weather silent speech recognition system combining ultrathin tattoo-like electrodes, a portable wireless DAQ module, and a cloud-deployed LDA classifier. The system conforms to facial skin under large deformations, acquires high-quality sEMG, and achieves 92.64% average accuracy over 110 daily words with few-shot training. Extensive demonstrations validate reliable operation during exercise, eating (mouth deformations), high noise, and darkness, with long-term wear improving electrode–skin electrical characteristics. The approach reduces user training burden compared to sign language and remains effective where audio/vision interfaces are impaired, supporting applications in assistive communication and robust human–machine interaction.

Limitations

Limitations are not explicitly detailed. The demonstrated vocabulary is limited to 110 words/phrases, and training requires user-specific data collection (10 repetitions per word). Most experiments and scenario demonstrations are described for a single subject (“the subject”), and cross-subject generalization is not reported. While robustness to channel loss is evaluated, broader multi-user and large-scale studies are not presented.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny