Computer Science

A deep learning framework for gender sensitive speech emotion recognition based on MFCC feature selection and SHAP analysis

Q. Hu, Y. Peng, et al.

A powerful new deep-learning approach dramatically boosts speech emotion recognition, improving accuracy by up to 15% over prior methods and enabling real-time analysis for applications like live TV audience monitoring. Research conducted by the authors listed in the <Authors> tag: Qingqing Hu, Yiran Peng, and Zhong Zheng, showcases CNN and LSTM-driven models that decode emotions such as happiness, sadness, anger, fear, surprise, and neutrality.... show more

Abstract

Speech is one of the most efficient methods of communication among humans, inspiring advancements in machine speech processing under Natural Language Processing (NLP). This field aims to enable computers to analyze, comprehend, and generate human language naturally. Speech processing, as a subset of artificial intelligence, is rapidly expanding due to its applications in emotion recognition, human-computer interaction, and sentiment analysis. This study introduces a novel algorithm for emotion recognition from speech using deep learning techniques. The proposed model achieves up to a 15% improvement compared to state-of-the-art deep learning methods in speech emotion recognition. It employs advanced supervised learning algorithms and deep neural network architectures, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. These models are trained on labeled datasets to accurately classify emotions such as happiness, sadness, anger, fear, surprise, and neutrality. The research highlights the system’s real-time application potential, such as analyzing audience emotional responses during live television broadcasts. By leveraging advancements in deep learning, the model achieves high accuracy in understanding and predicting emotional states, offering valuable insights into user behavior. This approach contributes to diverse domains, including media analysis, customer feedback systems, and human-machine interaction, showcasing the transformative potential of combining speech processing with neural networks.

Publisher

Scientific Reports

Published On

Aug 05, 2025

Authors

Qingqing Hu, Yiran Peng, Zhong Zheng

DOI

https://doi.org/10.1038/s41598-025-14016-w

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Computer Science

A comprehensive review of deep learning in EEG-based emotion recognition: classifications, trends, and practical implications

W. Ma, Y. Zheng, et al.

Biology

A deep learning approach for morphological feature extraction based on variational auto-encoder: an application to mandible shape

M. Tsutsumi, N. Saito, et al.

Medicine and Health

Deep learning in image-based breast and cervical cancer detection: a systematic review and meta-analysis

P. Xue, J. Wang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny