Medicine and Health
Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality
R. Bey, A. Cohen, et al.
This study by Romain Bey, Ariel Cohen, and colleagues unveils concerning trends in suicidality among youth by analyzing over 2.9 million electronic health records from Parisian hospitals. It highlights a significant post-pandemic rise in suicide attempts, particularly among adolescent girls, calling for heightened awareness and action regarding mental health.
~3 min • Beginner • English
Introduction
The study addresses the need for timely population-level mental health surveillance, heightened by concerns during the COVID-19 pandemic about increases in suicide attempts. Early evidence was mixed, and only later did consensus emerge that youth, particularly girls, were most affected. Traditional data sources and timelines impeded rapid, targeted interventions. Advances in NLP applied to large-scale textual data, especially EHRs, offer the potential for real-time indicators but face challenges of generalisability, multi-hospital integration, privacy, and local heterogeneity. The research question was whether NLP applied to multi-hospital EHRs can compute timely, robust indicators of suicidality at a population level, detect known pandemic-related changes in SA, and shed light on associated risk factors.
Literature Review
Prior work has primarily leveraged social media for mental health surveillance during COVID-19, detecting changes in posted content but with limited clinical applicability due to selection biases, inadequate stratification, and limited outcome coverage. Clinical NLP has shown promise in extracting suicidality-related variables from EHRs, but many models were trained on narrow cohorts (disease-, age-, or hospital-specific), raising generalisability concerns. Multi-hospital EHR analyses are hindered by privacy, technical barriers, and local documentation differences. The literature supports the feasibility of NLP for suicidality detection and risk factor extraction, yet calls for robust, generalisable, multi-institution approaches and validation across care contexts.
Methodology
Design: Multicentre observational retrospective cohort study using the AP-HP clinical data warehouse (38 hospitals; 15 adult and paediatric hospitals included with stable EHR deployment). Population: All hospitalisations from Aug 1, 2017 to Jun 31, 2022. Inclusion for SA analysis: hospitalisations caused by non-fatal, self-directed behavior with intent to die; excluded: self-harm without suicidal intent, ideation only, age <8 at admission; deduplication: if two SA-related hospitalisations occurred within 15 days for a patient, only the first counted.
Data sources: EHR clinical notes (especially last-edited discharge summaries), administrative data (age, sex, dates, in-hospital death), and ICD-10 claim diagnoses. Data extracted July 4, 2022.
NLP development: Two-stage pipeline. (1) Screening via dictionary of SA-related keywords/regular expressions grouped by modality (e.g., jumping, overdose) in clinical reports. (2) Stay-level classification using a neural network (ROBERTa architecture initialised with CamemBERT pre-trained on 21 million French clinical reports) to validate mentions and filter out negations, non-patient mentions, history, reported speech, and hypothetical mentions. A stay was classified SA-caused if at least one valid mention was found in its discharge summary. Implementation used EDS-NLP v0.6.1. Training/validation split by hospital (10 train, 5 validate). A total of 1,571 mentions were annotated from 465 stays; inter-annotator agreement was assessed on a subset.
Risk factor extraction: Rule-based NLP algorithms applied to discharge summaries of SA-caused stays to detect: social isolation, domestic violence, sexual violence, physical violence, and prior suicide attempt history.
Algorithm validation: PPV for SA detection estimated by chart review of 162 discharge summaries from validation hospitals, split pre/post-pandemic; inter-annotator positive/negative agreements reported. PPV for risk factor detectors estimated on stays positive for SA and the given factor. Sensitivity not estimated due to rarity of SA in all hospitalisations.
Primary outcome/indicator: Monthly number of SA-caused hospitalisations overall and stratified by sex and age groups (8–17, 18–25, 26–65, 66+).
Statistical analysis: Interrupted time-series regression with seasonality adjustment to test post-COVID trend change: Nr = a0 + a1 T + a2 (T−7) 1{T≥7} + βm 1{month=m} + error; pre-period Aug 2017–Feb 2020; post-period Mar 2020–Jun 2022. Ordinary least squares used; 95% CIs reported. Severity comparisons used survival analysis on length of stay (censored) and in-hospital death, with log-rank tests. Exploratory analysis compared pre/post prevalence ratios of risk factors via Fisher exact tests (two-sided, p≤0.05). Software: statsmodels v0.13.2, lifelines v0.26.4.
Sensitivity analyses: (i) Alternative classification using only diagnosis claim codes; (ii) Alternative rule-based NLP SA classifier (no ML), with additional annotation to estimate PPV; (iii) Adjustment for missing data by dividing monthly SA counts by average discharge summary completeness; (iv) Per-hospital analyses.
Key Findings
- Cohort: 2,911,920 hospitalisations; 14,023 SA-related hospitalisations (0.5%), involving 11,786 individuals; pre-COVID 5,954 (42.5%), post-COVID 8,069 (57.5%). Mean age 38.0 years (SD 20.7). Female 64.3% (9,015/14,023), male 35.7% (5,008/14,023).
- NLP performance: Main hybrid SA algorithm PPV 0.85 pre (95% CI 0.76–0.91; n=85) and 0.86 post (95% CI 0.76–0.92; n=77), outperforming alternative rule-based algorithm (PPV ~0.51–0.52). Risk factor detectors PPV ranged 0.83–1.00 across periods. Inter-annotator agreements: SA detection positive/negative agreements 0.92/0.50; risk factors mostly 1.0 for both.
- Interrupted time-series (seasonally adjusted): Overall post-COVID slope increase in monthly SA hospitalisations 3.7 (95% CI 2.1–5.3). By subgroup: girls 8–17 years 1.8 (95% CI 1.2–2.5); women 18–25 years 1.1 (95% CI 0.7–1.5); males overall 0.9 (95% CI 0.2–1.6). Residuals showed no notable time trend, indicating adequate model fit.
- Sensitivity analyses: Findings robust when using the alternative rule-based NLP, completeness adjustment, and per-hospital aggregation (though individual hospitals often lacked significance). Claim-based classification did not yield significant effects.
- Characteristics of SA hospitalisations: Method mix stable over time; higher intentional drug overdose among women. Time-to-exit shorter post-COVID; in-hospital death less frequent post-COVID (length of stay p<0.001; death p=0.007). Short stays became shorter post-COVID; longer stays unchanged.
- Risk factors (pre vs post prevalence ratios, overall): Domestic violence PR 1.3 (95% CI 1.16–1.48, p<0.0001); Physical violence PR 1.3 (95% CI 1.10–1.64, p=0.0047); Sexual violence PR 1.7 (95% CI 1.48–1.98, p<0.0001). Social isolation PR 1.2 (95% CI 1.09–1.39, p=0.00071). Prior SA history PR 1.1 (95% CI 1.05–1.17, p<0.0001). Violence factors were more frequently reported in females both before and after the outbreak.
Discussion
Applying NLP to multi-hospital EHRs produced timely, population-level mental health indicators that detected the pandemic-associated rise in SA, particularly among girls and young women, aligning with external evidence. The robust performance of a hybrid ML-rule-based NLP approach enabled identification of rare events across millions of notes and across heterogeneous hospital settings. The increased reporting of violence (domestic, physical, sexual) among SA-related stays post-COVID underscores the likely contribution of violence to the disproportionate impact on females and youths. These indicators, if deployed prospectively, could function as early warning tools to inform public health and clinical responses, guide resource allocation, and design targeted prevention strategies. The multi-institution aggregation was crucial, as single-hospital analyses often lacked power, supporting the relevance of regional-scale EHR-based surveillance.
Conclusion
NLP applied to structured and unstructured EHR data across multiple hospitals can yield actionable surveillance indicators of suicidality. Retrospective analysis during the COVID-19 period revealed a significant increase in SA hospitalisations, mainly among girls and young women, and highlighted increased reporting of violence as a key associated factor. This approach can complement existing surveillance systems and inform prevention efforts, particularly those addressing violence against women at early ages. Future work should implement prospective, real-time monitoring, expand extracted variables (e.g., care utilization, socioeconomic determinants), and integrate these tools within clinical workflows to enhance SA prevention.
Limitations
- Observational retrospective design precludes causal inference.
- Utility for crisis response needs confirmation via prospective deployment.
- NLP detects clinician-reported SA and risk factors; reporting may vary by clinician practices, experience, and EHR usability.
- Emergency department visits not resulting in hospitalisation were excluded, biasing toward more severe SA and limiting assessment across the full severity spectrum.
- Sensitivity of the SA detection model was not estimated due to rarity of events and annotation burden.
Related Publications
Explore these studies to deepen your understanding of the subject.

