logo
ResearchBunny Logo
Detection of acute 3,4-methylenedioxymethamphetamine (MDMA) effects across protocols using automated natural language processing

Psychology

Detection of acute 3,4-methylenedioxymethamphetamine (MDMA) effects across protocols using automated natural language processing

C. Agurto, G. A. Cecchi, et al.

Discover groundbreaking research conducted by Carla Agurto and colleagues, utilizing automated speech analysis to unveil objective markers of mental states influenced by MDMA and oxytocin. With impressive classification accuracies of up to 92%, this study highlights the potential of speech analysis as a tool for understanding intoxication-related mental states.... show more
Introduction

The study addresses whether automated natural language processing of free speech can objectively detect acute mental state changes induced by psychoactive drugs, specifically MDMA and intranasal oxytocin. Traditional assessments of intoxication rely on subjective self-report scales with limited sensitivity and potential biases. Speech is a rich, low-cost, and reliable source of semantic, syntactic, and acoustic information. Prior work by the authors using semantic analyses (LSA, bag-of-words) suggested speech can differentiate MDMA from placebo and other stimulants, but was limited by small samples, focus on content only, and lack of independent validation. The present work aims to provide a comprehensive assessment using acoustic, semantic, and psycholinguistic features across two speech tasks, to test hypotheses that: (i) each drug condition exhibits a unique speech signature across domains; (ii) higher MDMA dose produces greater changes; (iii) monologue (no listener) allows freer emotional expression than an interviewer-elicited description; and (iv) models generalize to independent datasets.

Literature Review

Background work shows computerized speech analysis can quantify clinically relevant phenomena (e.g., incoherence in schizophrenia) and is increasingly sophisticated with NLP methods. Two earlier studies by the authors found: (1) in N=13, LSA-based semantic proximity to prosocial concepts was higher on MDMA than methamphetamine or placebo, with cross-validated classification 84–88%; (2) in N=35, bag-of-words with random forests differentiated MDMA vs placebo, highlighting social and emotional valence words. Other approaches include computerized word counts and manual analyses in drug studies. However, prior work largely focused on semantic content, used small samples, and lacked independent validation, motivating a broader, multimodal speech feature approach validated across datasets.

Methodology

Design and participants: Thirty-one healthy adults (12 females; mean age ≈24.3±4.4 years) who had used ecstasy/molly at least twice completed a randomized, double-blind, within-participants study with four sessions: placebo, MDMA 0.75 mg/kg, MDMA 1.5 mg/kg, and intranasal oxytocin 20 IU. Sessions were ≥5 days apart, ran ~9:00–13:30, and used a double-dummy (oral capsule then intranasal spray 30 min later) so no session combined active MDMA and oxytocin. Drug use and abstinence were verified via urine, saliva, and breath tests; women were pregnancy-tested. Speech tasks occurred ~75–105 min post-MDMA/placebo (peak effects).

Speech tasks: Each session included two audio-recorded tasks (44.1 kHz, WMA): (1) Description: 5 min free speech about an important person, with a research assistant present to encourage continuous speaking; (2) Monologue: up to 5 min speaking alone on any topic (suggested topics provided). Transcripts were produced by a rater blind to condition. The training/validation dataset thus had 31 participants × 4 sessions × 2 tasks; 4 initial participants were excluded for unusable recordings.

Pre-processing: For acoustics, the first and last 30 s were excluded to improve reliability; interviewer speech was manually removed in Description. For transcripts, interviewer speech, punctuation, and special characters were removed.

Feature extraction: Three domains were extracted per recording.

  • Acoustic (88 features) via Praat and Python: voice stability (jitter, shimmer, voice breaks); noise measures (HNR, NHR, autocorrelation); temporal features (pause and utterance distributions, articulation/speech rates); pitch distribution/variations; spectral characterization (max dB, max frequency, energy, slope); MFCCs (16 coefficients); vowel space/formants (F1–F3 distributions; total area; a-i-u area).
  • Semantic: Latent Semantic Analysis using TASA corpus vectors. After tokenization, POS tagging, and lemmatization (NLTK, WordNetLemmatizer), median cosine similarity was computed between transcript words and 21 a priori concepts relevant to MDMA effects: affect, anxiety, compassion, confidence, disdain, emotion, empathy, fear, feeling, forgive, friend, happy, intimacy, love, pain, peace, rapport, sad, support, think, talk.
  • Psycholinguistic/syntactic: CPIDR-derived total words, number of ideas, and idea density; part-of-speech proportions (pronouns, nouns, verbs, determiners, indefinites, definites, first-person singular “I”); lexical richness/diversity (Honoré’s statistic, Brunet’s index), content vs empty words, type-token ratio, word frequency, fillers.

Statistical analyses:

  • Condition-level univariate comparisons: For each task separately, paired Wilcoxon sign-rank tests compared placebo to each active condition and low vs high MDMA; False Discovery Rate correction (q<0.05, Benjamini–Hochberg). Partial correlations among features passing FDR were computed using the inverse of a regularized covariance matrix to assess feature interactions.

  • Classification: To account for inter-individual baseline differences, within-subject differential features were created by subtracting feature vectors across paired conditions (A−B vs B−A). Binary tasks included: placebo vs MDMA 0.75; placebo vs MDMA 1.5; placebo vs oxytocin; MDMA 0.75 vs MDMA 1.5, separately for each task (Description, Monologue). Features were z-standardized. Classifiers: linear SVM, k-nearest neighbors, and random forest. Model selection and performance were assessed via nested leave-one-participant-out cross-validation. Feature selection used training-set two-sample t-test ranking; optimal subsets reported. Significance was assessed with binomial tests.

  • Multivariate (post hoc): For linear SVM models, absolute weights were normalized to sum to 1 to assess relative feature contributions; features contributing >10% were highlighted.

Validation datasets: Trained models (from N=31 training/validation dataset) were tested on two independent Description-task datasets with MDMA (0.75, 1.5 mg/kg) and placebo. Acoustic features were unavailable due to lower audio quality. In ID2, due to 10-min task duration, total word count and number of ideas were excluded.

  • ID1: N=36 (18 females), three sessions (placebo, MDMA 0.75, MDMA 1.5); speech at 140 min post-dose; demographics differed in race from training set.
  • ID2: N=13 (4 females), three sessions (placebo, MDMA 0.75, MDMA 1.5); speech at 130 min post-dose; 10-min task. Demographic race composition also differed from training set.
Key Findings
  • Univariate condition-level differences: Ten features differed significantly after FDR correction. Acoustic features were more prominent for detecting oxytocin effects, especially in the Monologue task (e.g., F2-related measures), while Description task often highlighted pause-related acoustics and psycholinguistic features. Different feature sets were implicated across tasks and drug contrasts.
  • Feature interactions: Partial correlations among top features were stronger under active drugs, especially MDMA, than placebo. Multidimensional scaling of partial correlation patterns revealed two primary axes: (a) task (monologue vs description), and (b) MDMA dose (high vs low), with task separable along one dimension and dose along the other.
  • Classification performance (cross-validated, training/validation set): • Best accuracies reached up to 87% for low-dose MDMA vs placebo (Monologue) and 84% (Description) using selected features. Overall, linear SVM performed best, followed by random forest, then nearest neighbors. Combined features did not always outperform domain-specific subsets. • Acoustic features were more informative for Monologue; semantic/psycholinguistic features were more informative for Description. Oxytocin vs placebo was particularly well captured by acoustic features (prosody/emotion-relevant acoustics). • Higher MDMA dose did not consistently increase separability from placebo; in several analyses MDMA 0.75 vs placebo outperformed MDMA 1.5 vs placebo.
  • Multivariate weight analysis: For Monologue, psycholinguistic features contributed little; for Description, they contributed substantially in three of four contrasts. Top univariate features generally aligned with top model weights.
  • External validation: Models trained on the primary dataset achieved up to 92% accuracy in ID1 and up to 66% in ID2 (chance=50%), using semantic/psycholinguistic features with feature selection. Despite methodological differences (e.g., timing, task duration, demographics), accuracies were significantly above chance by conservative binomial tests.
Discussion

The findings support that automated speech analysis across acoustic, semantic, and psycholinguistic domains can objectively detect acute drug-induced mental state changes. Distinct speech signatures were observed by drug and task, addressing the hypothesis that different psychoactive states manifest across multiple speech domains. Task context mattered: Monologue (no listener) enhanced affective acoustic markers (e.g., MFCCs, formants), while interviewer-elicited Description favored transcript-based features, consistent with the idea that social interaction constraints modulate expressivity. Feature interaction patterns (partial correlations) and MDS revealed separable dimensions for task and MDMA dose, indicating multidimensional structure in speech changes. Contrary to the dose hypothesis, higher MDMA dose did not uniformly increase classification accuracy or distance from placebo, suggesting non-linear dose–response effects on speech or ceiling effects in elicited tasks. Importantly, models generalized to independent datasets, demonstrating external validity, though performance varied with methodological factors (e.g., time post-dose, demographic differences), underscoring sensitivity to experimental context. Overall, results strengthen the case for speech-based digital phenotyping as an objective complement to subjective assessments in psychopharmacology and psychiatry.

Conclusion

This work provides a proof-of-concept that a comprehensive set of automated speech features can detect acute effects of MDMA and oxytocin across tasks and datasets. It extends prior content-only analyses by integrating acoustic and psycholinguistic markers, achieving up to 87% cross-validated accuracy in the training dataset and up to 92% in external validation. The study highlights the importance of task design and timing relative to drug effects, and suggests drug-specific and dose-sensitive speech signatures. Future research should: (1) collect larger, more diverse datasets spanning broader demographics and languages; (2) systematically vary task instructions and social context; (3) examine additional commonly used substances (e.g., cannabis) and real-world impairment; (4) incorporate high-quality acoustics in validation cohorts; and (5) develop interpretable, robust models suitable for clinical and field deployment.

Limitations
  • Secondary analysis of datasets originally designed for other purposes; subsets used in prior publications.
  • Methodological heterogeneity across datasets (timing relative to drug peak, task duration/instructions) complicates comparisons.
  • Limited drug conditions (two MDMA doses, one oxytocin dose); other substances not assessed here.
  • Acoustic features unavailable in validation datasets due to audio quality; in ID2, longer duration necessitated exclusion of duration-sensitive features (total words, ideas).
  • Demographic differences (notably race/ethnicity proportions) between training and validation sets may affect speech behavior and model performance.
  • Potential sensitivity of results to feature selection and classifier choice; non-linear dose–response patterns not fully characterized.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny