logo
ResearchBunny Logo
Speaking without vocal folds using a machine-learning-assisted wearable sensing-actuation system

Engineering and Technology

Speaking without vocal folds using a machine-learning-assisted wearable sensing-actuation system

Z. Che, X. Wan, et al.

Discover a groundbreaking self-powered wearable system that enables speaking assistance without vocal folds, boasting an impressive 94.68% accuracy through machine learning. This innovation by Ziyuan Che, Xiao Wan, Jing Xu, Chrystal Duan, Tianqi Zheng, and Jun Chen promises to enhance lives for those with vocal fold dysfunction.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the need for a noninvasive, wearable solution to assist communication in individuals with voice disorders who cannot rely on vocal folds during pre- or post-treatment recovery. Voice disorders, caused by conditions such as polyps, keratosis, paralysis, nodules, and spasmodic dysphonia, as well as postoperative effects of laryngeal cancer surgeries, are prevalent and impactful: 29.9% of the general population experience a voice disorder in their lifetime, ~7% have current problems, and 7.2% of employed participants report missed work days. Recovery from therapeutic interventions can require months of reduced or absolute voice use. Existing options (electrolarynx devices, talk boxes, tracheoesophageal puncture) can be inconvenient, uncomfortable, or invasive. The research proposes a wearable, self-powered sensing–actuation platform that captures extrinsic laryngeal muscle movements and, with machine learning, converts them into speech, enabling speaking without vocal fold vibration.
Literature Review
Prior wearable throat devices have used materials such as PVDF (piezoelectric), gold nanowires, and graphene. PVDF offers precise mechano-to-voltage conversion via piezoelectricity but material choices are limited for tailoring device design, and piezoelectric actuation typically requires higher driving voltages that raise safety concerns in wearables. Resistive sensors using gold nanowires or graphene provide good conductivity and flexibility but require external power, increasing system complexity and bulk. Moreover, many such materials are intrinsically non-stretchable, limiting comfort and adhesion and causing preferential detection of vertical (rather than parallel/omnidirectional) throat deformation during phonation. Additional issues include poor perspiration resistance and temperature rise during use. These limitations motivate a stretchable, waterproof, self-powered approach capable of omnidirectional throat motion capture involving extrinsic and platysma muscles, especially relevant when vocal folds are not used.
Methodology
System design: A thin (~1.5 mm), flexible, lightweight (~7.23 g) wearable sensing–actuation device (30 mm × 30 mm; volume ~1.35 cm³) adheres to the throat. It comprises two symmetric modules: a sensing component (bottom) that converts extrinsic laryngeal muscle motion into electrical signals, and an actuation component (top) that outputs sound. Each component includes a PDMS membrane (~200 μm) and a magnetic induction (MI) layer formed by a serpentine copper coil (20 turns, ~67 μm wire diameter) for flexibility. A shared middle magnetomechanical coupling (MC) layer (PDMS + NdFeB micromagnets) ~1 mm thick features a kirigami pattern to enhance sensitivity, stretchability, and isotropy. Operating principle: The soft magnetoelastic MC layer converts stress from throat muscle expansion/contraction into magnetic flux density changes through magnetic particle–particle interactions and dipole–dipole domain rotation. The MI serpentine coils transduce flux changes into current via electromagnetic induction, enabling self-powered sensing (subsequent processing/conditioning uses external electronics). The kirigami structure accommodates 3D throat deformations (x–y expansion during muscle relaxation; z-thickening during contraction), improving omnidirectional capture. Signal chain and electronics: Analog current from the sensing coils is amplified and low-pass filtered (e.g., Stanford SR570 preamplifier; Low Noise mode; sensitivity 2 × 100 μA/V; Lowpass 6 dB at 100 Hz; NEG input offset 1×10 μA). Processed signals are digitized for ML classification. Mechanical and acoustic characterization: Device stretchability and sensitivity assessed across strains up to 164%. Response time and SNR measured; SPL characterized versus distance, frequency, angle, and strain. Resonances analyzed with focus on the first resonance point (FRP) and its right-shift with increasing strain. Durability tested over 24,000 cycles at 5 Hz. Temperature rise and long-duration SPL monitored during 40 min continuous use. Waterproof performance assessed in air vs underwater, including accelerated aging (7 days submersion), SPL vs submersion depth/distance, and frequency response underwater. Design parameter studies: Effects of kirigami geometry, coil turns (impact on thickness, response time, SNR, SPL), PDMS ratios (membrane softness), magnetic powder concentration in MC layer, and layer thicknesses on sensing and actuation were systematically evaluated to set final parameters balancing performance and flexibility. Human subject study: Eight participants (4 female, 4 male; average age 21) were recruited under UCLA IRB protocol (ID: 20-001882) with informed consent and $25 compensation. Tasks included producing extrinsic laryngeal movements (e.g., coughing, humming, nodding, swallowing, yawning) and voicelessly pronouncing words/sentences while standing still and during motion (walking, running, jumping). Signals were recorded with and without vocal fold vibration to evaluate content differences and robustness to body motion. Sweat-resilience was tested using an artificial sweat surrogate (standardized application before and after device placement; 5 min settling; performance recording). Machine-learning pipeline: Electrical signals of target sentences are segmented and zero-padded to the maximum duration window (N = 4 s × 100 Hz = 4000 samples) and reduced via principal component analysis (PCA) for redundancy removal. Multi-class support vector classification (SVC) with a one-vs-rest scheme trains a classifier per sentence class. For demonstrations, five sentences were used: S1 "Hi Rachel, how you are doing today?", S2 "Hope your experiments are going well!", S3 "Merry Christmas!", S4 "I love you!", S5 "I don't trust you." Each participant repeated each sentence 100 times to form training/validation sets; for overall evaluation across 8 participants, each produced 120 repetitions per sentence (100 for training, of which 20 used for validation; 20 held-out for testing). Real-time recognition selects the corresponding pre-recorded voice signal for playback through the actuator, enabling speaking without vocal folds. Fabrication: MC layer—NdFeB powder (D50 ~5 μm; Br 898–908 mT; BHmax 120–128 kJ/m³; Hci 700–740 kA/m) mixed with PDMS (Sylgard 184, 15:1 base:curing agent) at 4:1 (powder:PDMS) by weight, cast in a 30×30×1 mm PLA mold, cured at 70 °C >4 h, magnetized at 45° (IM-10-30, 350 V impulse). Kirigami pattern laser-cut (ULTRA R5000) using iterative passes to fully penetrate the membrane. Coils—serpentine geometry wound with 67 μm copper wire (spacing 22.3 ± 2.14 μm); 20 turns; thickness ~147.3 μm. Sensing/actuation membranes—PDMS (10:1) scraped on glass, coils placed pre-cure; cured 70 °C >4 h; membranes released and bonded to MC layer edges with PDMS, then oven-cured 4 h. Sweat simulation: Artificial Sweat BZ320 applied to cleaned skin (0.5 ml), device affixed, an additional 0.5 ml sprayed on device, 5 min settling, performance recorded.
Key Findings
- Self-powered, soft magnetoelastic wearable enables speech without vocal fold vibration by sensing extrinsic laryngeal muscle movements and actuating corresponding voice output via ML-assisted selection. - Device specs: lightweight ~7.2 g; compact (30 mm × 30 mm × ~1.5 mm); maximum stretchability 164%; intrinsically waterproof; driving voltage as low as 1.95 V for actuation. - Modulus: reported skin-alike modulus 7.83 × 10⁵ Pa. Separate stress–strain tests (Fig. S3) show Young’s modulus reduced from 2.59 × 10⁷ Pa to 7.83 × 10⁶ Pa with kirigami structure; another statement notes modulus on the order of 100 kPa for the stretchable structure. - Sensing performance: response time ~40 ms; SNR ~17.5; high-fidelity capture of distinct throat movements (coughing, humming, nodding, swallowing, yawning); robust to whole-body motion (standing, walking, running, jumping). - Muscle-only vs normal speech: voiceless signals showed slightly lower peak amplitudes and reduced high-frequency components relative to normal phonation, yet retained distinct syllabic patterns suitable for classification. - ML accuracy: overall prediction accuracy 94.68% across 8 participants on a five-sentence set; individual test accuracies >93%; example confusion matrices show validation 98% and testing 96.5% for one participant. - Acoustic output: SPL >40 dB at 1 m (typical conversation distance) and above normal speaking thresholds across the full human hearing range; waveform fidelity comparable to a commercial speaker with minor strain-induced spectral artifacts; FRP shifts to higher frequency with increased strain, enabling user-adaptive tuning. - Durability and safety: stable operation over 24,000 cycles at 5 Hz; negligible temperature rise and no SPL degradation over 40 min continuous use. - Waterproof/sweat resilience: similar output in air vs underwater with slight high-frequency loss; SPL >60 dB at 2 cm depth measured 20 cm away; performance unaffected by perspiration (p = 0.818).
Discussion
The work demonstrates that extrinsic laryngeal muscle movements can be captured by a soft magnetoelastic kirigami-based sensor and reliably mapped to intended sentences using PCA-SVM classification, thereby enabling voice output without engaging the vocal folds. This directly addresses the need for a noninvasive communication aid during periods of dysphonia or post-surgical recovery. The system overcomes limitations of prior flexible throat devices by providing stretchability and omnidirectional deformation capture, self-powered sensing, water/sweat resistance, and low driving voltages for safe actuation. High classification accuracy across multiple users, robust performance during body motion, sustained SPL without heating, and underwater/sweaty conditions suggest practical usability. The strain-tunable acoustic resonance further allows user-specific optimization. Collectively, the findings support the system’s potential to facilitate communication and improve quality of life for individuals with dysfunctional vocal folds.
Conclusion
A soft, self-powered, magnetoelastic wearable sensing–actuation system with a kirigami MC layer and serpentine induction coils was developed to enable speaking without vocal fold vibration. The device captures extrinsic laryngeal muscle movements with high fidelity and, using a PCA + multi-class SVM pipeline, selects corresponding pre-recorded voice signals for playback. It achieves high accuracy (94.68%), strong acoustic output across the human hearing range, stretchability (164%), and robust performance under motion, sweat, and underwater conditions, with minimal temperature rise and solid durability. The approach offers a feasible, noninvasive path to assist communication during recovery from voice disorders, with the potential to facilitate restoration of voice use and enhance patient quality of life.
Limitations
- Small sample size (n = 8 participants; average age 21, student cohort) limits generalizability. - Limited vocabulary: demonstrations focused on a predefined five-sentence set; classification trained per participant. - Methodological constraints noted by authors: no statistical method to predetermine sample size; no data excluded; experiments not randomized; investigators not blinded during allocation or outcome assessment. - The approach plays back selected pre-recorded voice signals rather than performing open-vocabulary, speaker-independent continuous speech synthesis; broader linguistic generalization was not evaluated.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny