Medicine and Health
Identifying mental health status using deep neural network trained by visual metrics
S. B. Shafiei, Z. Lone, et al.
This innovative study, conducted by Somayeh B. Shafiei, Zaeem Lone, Ahmed S. Elsayed, Ahmed A. Hussein, and Khurshid A. Guru, presents an objective method for mental health evaluation through a CNN-LSTM model utilizing visual metrics time-series data. With impressive classification accuracy rates, this research opens the door for at-home mental health monitoring applications.
~3 min • Beginner • English
Introduction
Cancer significantly affects the quality of life of patients. Psychologic evaluation and support are key to alleviate emotional distress, enhance coping, and improve prognosis. Disease extent and treatment-related physical impairment correlate with mood disorder severity in lung cancer, and intrusive thoughts—especially among breast cancer survivors—relate to psychological distress. Health metrics are associated with behavioral and psychological changes, and reducing psychological distress is crucial for better health. Among patients with pancreatic cancer, 71% had symptoms of depression and 48% had anxiety-related disorders. There is a strong association between suicidal ideation and depression in advanced cancer, and the incidence of suicide in patients with cancer is approximately double that in the general population.
Despite agreement on the importance of psychological assessment and intervention for malignancy, there is a lack of an objective method for evaluating mental health in this cohort. Current evaluations rely on self-reported, subjective questionnaires that may be time-consuming and complex. Objective methods proposed in prior work often require feature engineering and do not incorporate clinically approved assessment methods.
Eye movements, governed by ocular motor systems linked to the central nervous system, reflect brain activity. Disorders affecting the cerebral cortex, brainstem, or cerebellum can disturb ocular motor function. Prior studies have shown associations between ocular motor function and cognitive and mental disorders (e.g., Alzheimer’s, Parkinson’s, Huntington’s, Wilson’s, autism, antisocial personality disorder, PTSD). Inhibitory saccades are impaired in Alzheimer’s disease, attributed to frontal and prefrontal neurodegeneration. Emotional responses after unpleasant stimuli have been indexed by eye-blink startle and left-sided frontal EEG activation. Visual metrics have also been used to detect physiological and psychological changes such as mental fatigue, cognition and cognitive development, workload, stress, threat-related emotional expressions in infancy, shared attention for virtual agents, and emotional arousal and autonomic activation.
Deep learning enables automatic feature learning via hierarchical layers and excels at complex pattern recognition, with successes in strategic games, speech recognition, medical imaging, and health informatics. However, applications to mental health evaluation are scarce. Combining CNN and LSTM methods can leverage CNN feature extraction and LSTM sequence modeling for time-series physiological data. In this study, the authors investigate the feasibility of deep learning using visual metrics to objectively evaluate mental health metrics (hope, anxiety, well-being).
Literature Review
The paper summarizes prior objective approaches to mental health evaluation, noting that most require hand-crafted feature engineering and lack use of clinically approved assessment methods. A sample of studies (Table 1) includes: support vector machines classifying mental health status while watching videos (91% accuracy); Naïve Bayes and decision trees using computer game data (85% accuracy); neural networks on robust self-stimulation tasks (88% accuracy); and SVMs on behavioral and non-verbal cues (no accuracy reported in the excerpt). The authors also review extensive evidence linking ocular motor function and visual metrics to cognitive and psychiatric conditions (e.g., Alzheimer’s, Parkinson’s, Huntington’s, autism, antisocial personality disorder, PTSD), as well as studies tying visual metrics to emotional arousal, mental fatigue, mental workload, stress, and attention. These findings motivate the use of eye-tracking–derived visual time series as objective correlates of mental and emotional states. The gap identified is the absence of deep learning methods that both avoid manual feature engineering and align outputs with clinically grounded mental health assessments.
Methodology
Study design: A supervised three-class classification framework was developed to evaluate mental health metrics (HHI for hope, STAI for anxiety, WEMWBS for mental well-being) using multivariate visual time-series as inputs and clinically derived class labels as outputs.
Participants: Twenty-five individuals participated: sixteen patients who underwent oncologic surgery at Roswell Park Comprehensive Cancer Center and nine volunteers without cancer. Surgical procedures among patients included gastrointestinal (46%: gastroesophageal, HIPEC, colectomy, Whipple), urologic (35%: radical cystectomy, radical nephrectomy, parastomal hernia repair), thoracic (13%: minimally invasive esophagectomy), and soft tissue (7%: amputation). Inclusion required no continuous monitoring and no visual impairment. Informed consent was obtained.
Experimental setup and stimuli: Participants attended at least one 15-minute session in a dedicated in-house art gallery developed with the Albright-Knox Art Gallery. Eighteen artworks were selected to support prolonged, engaging viewing, favoring uplifting or transcendental compositions. Art types comprised abstract (non-representational forms using shapes, colors, forms, gestures), figurative (representations of human/animal forms, tied to the visible world), and landscape (outdoor scenes dominated by natural elements). The gallery contained three spaces corresponding to each type; participants were blinded to art type. Each participant viewed the 18 artworks in a fixed order for 50 seconds each; only time segments when the subject was directly looking at each artwork were included in analyses.
Data acquisition: Eye movements were recorded using Tobii Pro 2 eyeglasses (infrared video-based eye-tracking) at 100 Hz. Recorded time series included 20 features: time; gaze point X, Y; gaze point 3D X, Y, Z; gaze direction left X, Y, Z; gaze direction right X, Y, Z; pupil position left X, Y, Z; pupil position right X, Y, Z; pupil diameter left; pupil diameter right. A total of 370 recordings were collected; each recording lasted 15 minutes (18 artworks × 50 s).
Questionnaires and label definition: Participants completed pre- and post-study standardized questionnaires: Herth Hope Index (HHI; 12–48; higher indicates greater hope), State-Trait Anxiety Inventory for Adults (STAI; 20–80; higher indicates greater anxiety), and Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS; 14–70; higher indicates greater well-being). Based on psychotherapy literature and statistical suggestions, scores were categorized into three classes for each metric: HHI class 0 (low): 12 < HHI < 36; class 1 (intermediate): 36 < HHI < 42; class 2 (high): 42 ≤ HHI ≤ 48. STAI class 0 (normal): 20 < STAI < 44; class 1 (risk of anxiety/mood disorder): 44 < STAI < 54; class 2 (significant symptoms): 54 ≤ STAI ≤ 80. WEMWBS class 0 (low): 14 < WEMWBS < 42; class 1 (intermediate): 42 < WEMWBS < 59; class 2 (high): 59 < WEMWBS < 70.
Preprocessing: A moving average filter (window size = 3 samples) was applied to gaze data for noise reduction.
Model and training: The problem was formulated as a supervised three-class classification for each metric (HHI, STAI, WEMWBS). Inputs were multivariate visual time series; outputs were one of three classes (0, 1, 2) per sample. Ground-truth labels were derived from questionnaire scores using the defined cutoffs. The network was trained with categorical cross-entropy loss to produce class probabilities.
Architecture: A CNN-LSTM was used to capture local temporal patterns and longer-range dependencies. The architecture consisted of time-distributed 1D convolutions and pooling to extract features from subsequences, followed by an LSTM layer and dense layers for classification. Specifically (Table 2): TimeDistributed Conv1D (filters=64, kernel size=3, activation=ReLU); TimeDistributed Conv1D (64, 3, ReLU); TimeDistributed Dropout (0.5); TimeDistributed MaxPooling1D (pool_size=2); TimeDistributed Flatten; LSTM (units=100); Dropout (0.5); Dense (100, activation=ReLU); Dense (3, activation=softmax). The depth was selected by trial-and-error using training/validation performance. Inputs to the network were the recorded visual metrics from the 16 patients and 9 volunteers.
Key Findings
The proposed CNN-LSTM model objectively evaluated and categorized mental health metrics from eye-tracking visual time-series. Reported multi-class classification accuracies were: HHI (hope) 93.81%, STAI (anxiety) 94.76%, and WEMWBS (mental well-being) 95.00%.
Discussion
The study addresses the lack of objective, scalable mental health evaluation methods by demonstrating that deep learning models trained on eye-tracking visual metrics can accurately classify clinically grounded mental health levels (hope, anxiety, and well-being). The strong performance suggests that ocular-motor–linked visual behavior encodes information relevant to mental health status. By aligning predictions with established clinical scales (HHI, STAI, WEMWBS) and cutoff-based categories, the approach bridges objective physiological measurement with clinically interpretable outcomes. The method’s accuracy indicates feasibility for augmenting or streamlining psychological assessment and for facilitating timely referrals to psychosocial services. Its reliance on passive visual metrics during art viewing also highlights potential for non-invasive, engaging monitoring contexts.
Conclusion
This work introduces an objective mental health evaluation method using a CNN-LSTM model trained on multivariate eye-tracking time series to categorize hope (HHI), anxiety (STAI), and mental well-being (WEMWBS) into clinically relevant classes with high accuracy (approximately 94–95%). The approach can be integrated into applications for home-based mental health monitoring for patients after oncologic surgery to identify their mental health status and potentially enhance care pathways by enabling earlier detection and intervention.
Limitations
Related Publications
Explore these studies to deepen your understanding of the subject.

