Introduction
Social interaction deficits are a hallmark of many psychiatric disorders, particularly Autism Spectrum Disorder (ASD). Objective and efficient assessment of these deficits remains a significant challenge. Current diagnostic methods, like the Autism Diagnostic Observation Schedule (ADOS) and Autism Diagnostic Interview-Revised (ADI-R), rely heavily on clinician judgment, leading to inconsistencies and potential delays in diagnosis, especially for high-functioning individuals who may mask their symptoms. The lack of standardized, quantifiable measures hinders accurate diagnosis and monitoring of treatment progress. This research addresses this gap by proposing a novel digital tool, the Simulated Interaction Task (SIT), to objectively measure social interaction deficits. The SIT aims to capture atypical social behaviors in ASD, including reduced emotional sharing, decreased facial mimicry, atypical gaze patterns, and aberrant voice intonation, allowing for earlier and more reliable diagnosis and assessment of treatment efficacy. The study leverages computer vision and machine learning to analyze video and audio recordings of the SIT, aiming to identify social biomarkers predictive of ASD in adults with normal intelligence.
Literature Review
Existing literature highlights the prevalence of social cognitive dysfunction across various psychiatric disorders, with impairments in facial affect recognition and emotion signaling being particularly prominent in ASD. While standardized tests exist for assessing general cognitive function, objective quantification of social interaction deficits remains underdeveloped. Clinical assessments rely heavily on clinician expertise, which is subjective, time-consuming, and difficult to standardize. While some research explores the potential of analyzing speech, facial expressions, and gaze behavior to detect ASD, standardized paradigms with reproducible interaction are lacking. This study builds upon previous research using machine learning to analyze nonverbal behavior and predict ASD diagnosis, but addresses the significant gap in focusing on adults with ASD and normal intelligence.
Methodology
The study comprised two phases: a preparatory facial electromyography (EMG) study and a main clinical study. The EMG study, using a sample of healthy male controls, provided precise measurements of facial muscle movements during the SIT to inform feature selection for the main study. The main study involved 37 adults with ASD (meeting ICD-10 criteria, without intellectual disability or comorbid neurological disorders) and 43 healthy controls (HCs). Participants engaged in a 7-minute standardized simulated dialogue (SIT) via video with a pre-recorded actress discussing food preferences. The SIT was structured into three parts: a neutral introduction, a section about liked food (positive emotion), and a section about disliked food (negative emotion). Data analysis involved computer vision (OpenFace) for facial expressions and gaze behavior, and audio analysis (librosa and Praat) for vocal characteristics. Key features extracted included facial action units (AUs), gaze angle, gaze speed and acceleration, fundamental frequency, harmony-to-noise ratio (HNR), and more. Machine learning (random forest classifier) was used to predict ASD diagnosis based on various combinations of facial, gaze, and voice features. Eight clinical experts independently rated the videos to provide a benchmark for comparison.
Key Findings
The computer-based analysis demonstrated the potential of the SIT as a diagnostic tool. Using only facial expressions and vocal characteristics, the machine learning model achieved an accuracy of 73% (sensitivity 67%, specificity 79%) in identifying individuals with ASD. Reduced social smiling (AU12), reduced facial mimicry (AU5, AU6), and a higher voice fundamental frequency and HNR were characteristic of individuals with ASD. The automated analysis significantly outperformed a majority vote and performed equally well compared to clinical expert ratings. The analysis of gaze behavior alone yielded lower accuracy (AUC = 0.63). The accuracy of the classifier varied somewhat between genders with better classification performance for females than males (70% vs 67%). The predicted class probabilities correlated positively with the participants’ ADOS scores and age.
Discussion
The findings demonstrate the feasibility and potential of the SIT as a cost-effective and time-efficient digital tool for objectively quantifying social interaction deficits in ASD. The results support the use of the SIT and automated analysis as a valuable supplement to traditional clinical assessment, offering potential for earlier and more reliable diagnosis and monitoring of treatment. The superior performance of facial expression and voice features compared to gaze features may be attributed to limitations in the precision of the automated gaze tracking method. The study replicates findings of reduced social smiling and mimicry in ASD and identifies novel indicators from vocal characteristics. The high accuracy achieved by machine learning highlights the potential of multivariate analysis in capturing complex social behavior. The comparability of the automated analysis to clinical expert ratings suggests a potential for broader implementation of the SIT.
Conclusion
The SIT offers a standardized, accessible, and cost-effective method for assessing social interaction deficits, particularly in ASD. While not a replacement for clinical expertise, it presents a valuable supplementary tool for screening, diagnosis, and monitoring treatment progress. Future research should focus on validating the SIT in larger and more diverse clinical samples, exploring its use in other psychiatric conditions, and improving the accuracy of automated gaze analysis. Investigation into the home setting usage and adaptability of the actress's responses will further enhance the SIT's potential.
Limitations
The relatively small sample size and the focus on adults with high-functioning ASD might limit the generalizability of the findings. The accuracy of the automated gaze tracking could be improved, and further studies comparing SIT performance to in-person assessments are needed. The pre-recorded nature of the interaction partner's responses might limit the naturalness of the interaction, although efforts were made to mitigate this. Future iterations might benefit from more dynamic responses from the interaction partner.
Related Publications
Explore these studies to deepen your understanding of the subject.