logo
ResearchBunny Logo
Real-Time Prediction for Athletes' Psychological States Using BERT-XGBoost: Enhancing Human-Computer Interaction

Computer Science

Real-Time Prediction for Athletes' Psychological States Using BERT-XGBoost: Enhancing Human-Computer Interaction

C. Duan, Z. Shu, et al.

Combining BERT's contextual understanding with XGBoost classification, this hybrid model detects athletes' emotions, anxiety, and stress from structured and unstructured data and delivers adaptive, real-time feedback to boost performance and mental health. Research conducted by Chenming Duan, Zhitao Shu, Jingsi Zhang, and Feng Xue.... show more
Introduction

The study addresses how to accurately understand and predict athletes’ psychological states (e.g., emotions, anxiety, stress) that critically influence performance and team dynamics. Traditional assessments (surveys, interviews) often miss nuanced, real-time variations in mental states. Advances in machine learning, NLP, and HCI enable richer analysis from self-reports, observational text, and physiological/contextual cues. The purpose is to develop a hybrid BERT-XGBoost model integrated with HCI mechanisms to detect psychological patterns and deliver adaptive, real-time feedback and interventions. This is important for real-time decision-making, personalized mental health support, and performance optimization for athletes, coaches, and sports psychologists.

Literature Review

Recent work establishes BERT as a strong foundation for sentiment and aspect-based analysis. Studies show BERT’s superiority in capturing contextual semantics and fine-grained relations, with domain post-training further improving performance (Hoang et al.; Xu et al.; Li et al.; Singh et al.). Hybrid approaches pair BERT-derived embeddings with ensemble methods like XGBoost to improve robustness and accuracy, particularly on noisy, large-scale social media data (Samih et al.; Hama Aziz & Dimililer). Extensions to multimodal sentiment analysis combine textual and non-textual features with deep networks enhanced by XGBoost (Chandrasekaran et al.). Reviews position BERT as a dominant, bidirectionally contextual model in sentiment analysis (Alaparthi & Mishra). Collectively, literature supports integrating BERT’s representation learning with XGBoost’s efficient classification to address noise, scalability, and nonlinearity.

Methodology

Data: Self-reports, observational records, and performance logs from athletes during training and competitions. Texts include narrative descriptions labeled by sports psychologists with emotion tags (e.g., Anxiety, Stress, Burnout, Focus). Descriptive statistics quantified linguistic features: statement length, word count, average word length, and vocabulary size. A correlation analysis showed strong positive relationships among text length, word count, and vocabulary size. Model: A BERT-XGBoost hybrid for text classification. Preprocessing includes text standardization, tokenization, and special token insertion. BERT (12-layer Transformer) generates 768-dimensional contextual embeddings that capture word order, syntax, and semantic context. These embeddings feed into an XGBoost classifier that models nonlinear relationships in high-dimensional space. Training configuration: 500 decision trees, learning rate 0.05, and early stopping with 10 rounds to mitigate overfitting and ensure stable convergence. Evaluation: Precision, recall, F1-score per class and overall accuracy; confusion matrix analysis to identify common misclassifications. System/interaction design: The approach is intended for real-time monitoring and adaptive feedback within an HCI framework, enabling personalized interventions based on detected psychological states.

Key Findings
  • Descriptive distribution of mental health categories: Normal 31.0%; Depression symptoms 29.2%; Suicidal 20.2%; Anxiety 7.3%; Bipolar disorder 5.3%; Stress 4.9%; Personality disorder 2.0%. Nearly 70% exhibit mental health issues; depression and suicidal tendencies account for about half of severe cases. - Correlation analysis: Text length, word count, and vocabulary size show significant positive correlations; average word length has low correlation with other features. - Model performance (n=8,596): Overall accuracy 0.94; macro avg precision/recall/F1 = 0.94; weighted avg precision/recall/F1 = 0.94. Per-class metrics (precision/recall/F1/support): Class 0: 0.95/0.92/0.93/1260; Class 1: 0.88/0.82/0.85/1220; Class 2: 0.88/0.88/0.88/1187; Class 3: 0.97/0.99/0.98/1252; Class 4: 0.97/1.00/0.98/1215; Class 5: 0.96/0.99/0.98/1210; Class 6: 1.00/1.00/1.00/1252. - Confusion matrix analysis: Frequent confusions between categories 1 and 2; category 6 achieved perfect classification. - The hybrid model effectively captures deep semantics (BERT) and classifies efficiently and robustly (XGBoost), supporting accurate identification of psychological states and trends relevant to athletic performance.
Discussion

The findings demonstrate that the BERT-XGBoost hybrid accurately predicts athletes’ psychological states from textual inputs, addressing the need for nuanced and real-time assessment beyond traditional surveys. High overall F1 and near-perfect performance in multiple classes show strong generalization to complex, high-dimensional semantic features. Misclassifications between closely related categories (e.g., categories 1 and 2) indicate overlapping linguistic signals, suggesting targeted feature refinement or additional context is needed. The descriptive statistics and correlations inform how text complexity relates to emotional expression, aiding feature interpretation. In an HCI context, reliable, real-time classification enables adaptive feedback that can foster athlete engagement, trust, and timely interventions, translating model outputs into practical mental health and performance support.

Conclusion

A hybrid BERT-XGBoost model can effectively predict athletes’ psychological states and reveal their influence on performance. By uniting BERT’s contextual embeddings with XGBoost’s efficient classification, the system achieves 94% accuracy and robust F1 scores across classes, handling both structured and unstructured data. The work underscores the importance of early detection of anxiety and stress and supports deploying real-time monitoring and adaptive feedback loops in practice. Future research should enhance differentiation among closely related states (e.g., anxiety vs. stress), incorporate additional behavioral, physiological, and contextual signals (e.g., HRV, facial expressions), and integrate multimodal sources (text, video, voice). Expanding datasets across sports, competition levels, and cultures will improve generalizability and accessibility, fostering a human-computer partnership that empowers athletes to manage mental health and optimize performance.

Limitations
  • Difficulty distinguishing closely related psychological states (e.g., anxiety vs. stress) leading to misclassifications between certain categories. - Current approach relies primarily on textual inputs; lack of integrated multimodal/physiological features may limit precision for ambiguous cases. - Generalizability may be constrained by dataset scope; broader coverage across sports, levels, and cultural contexts is needed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny