Medicine and Health
Artificial intelligence in electroencephalography analysis for epilepsy diagnosis and management
C. Wang, X. Yuan, et al.
The review addresses how artificial intelligence can overcome limitations of conventional electroencephalography in epilepsy care. EEG is central to diagnosis, classification, and localization of epileptogenic foci but suffers from low spatial resolution, variable data quality, modest detection rates (40–50% in routine EEG; ~70–80% in 24-h monitoring), and subjective, inefficient manual interpretation. The research question is how AI—spanning machine learning, deep learning, and multimodal fusion—can enhance detection, localization, monitoring, prediction, and treatment evaluation, and what challenges impede clinical translation. The study emphasizes two AI paradigms: supportive AI (assisting detection/recognition/localization and workflow) and predictive AI (forecasting seizures and treatment outcomes by learning subtle patterns).
A systematic search was performed in PubMed and Web of Science targeting 2012–2025, with focus on 2020–2025. Keywords included epilepsy, electroencephalography/EEG, artificial intelligence, deep learning, machine learning, multimodal data fusion, seizure prediction, treatment outcome, seizure detection, and specific algorithms (CNN, RNN, SVM). Boolean expressions (e.g., ('Artificial Intelligence' OR 'Machine Learning' OR 'Deep Learning') AND ('Electroencephalography' OR 'Epilepsy' OR 'Seizure Detection') AND ('EEG-based Epilepsy Care')) were used. Inclusion criteria: EEG-based studies in epilepsy patients applying AI models for detection, localization, or prediction; original research (clinical trials, simulations, retrospective analyses) with preference for larger, well-designed studies. Exclusion criteria: non-AI methods, non-epilepsy EEG relevance, non-original publications (conference abstracts, reviews), incomplete data or insufficient methods. PRISMA: 317 records identified (PubMed n=210; Web of Science n=107), 28 duplicates removed; 289 records screened; 147 excluded; 142 reports sought and assessed; 66 excluded (25 incomplete data; 18 conference abstracts; 23 non-AI); 76 studies included.
Systematic review methodology encompassed: database selection (PubMed, Web of Science), time span (2012–2025; emphasis 2020–2025), comprehensive keyword strategy including algorithm names, and structured Boolean search expressions per database. The screening process followed PRISMA: deduplication, title/abstract screening per predefined inclusion/exclusion criteria, and full-text eligibility assessment. Final inclusion comprised 76 studies spanning supportive and predictive AI applications in epilepsy EEG analysis. Data extracted included model architecture, input features, sampling frequency, electrode configurations, datasets, validation methods, and reported performance metrics. The review also cataloged multimodal fusion approaches, artifact handling techniques (e.g., ICA, GAN-based denoising), and clinical translation aspects (e.g., SCORE-AI deployment).
Supportive AI (detection, localization, workflow support): • SCORE-AI (multicenter routine EEG): AUC 0.89–0.96; specificity ~87–90%; sensitivity ~86.7%; accuracy 88.3%; outperformed expert consensus in specificity and comparable sensitivity; integrated into Natus platform but excluded neonates/critically ill. • CNN+feature fusion (Bonn/New Delhi): Bonn accuracy up to ~99–100%, sensitivity ~99–100%, specificity ~98–100%; New Delhi dataset achieved 100% across metrics in some settings; limited clinical generalizability due to small datasets. • S-transform + 15-layer CNN (iEEG): segment sensitivity 97.01%, specificity 98.12%; event sensitivity 95.45%, FDR 0.36/h; currently iEEG-only; patient-specific tuning required. • FCN+NLSTM (multi-database): sensitivity ~95–97%, false detection rate 0.487/h (Freiburg), 0.66/h (CHB-MIT); end-to-end temporal dependency capture. • STFT+GoogleNet CNN (scalp, CHB-MIT): accuracy 97.74%, sensitivity 98.90%, FPR 1.94%, detection delay ~9.85 s; frequency resolution and model choice constrained by GPU resources. • Neonatal CNN-RF (NICU): sensitivity 77%, FAR 0.9/h, AUC 83% (improved to AUC 88% and FAR 0.73/h in subset); heavy training compute; generalizability and interpretability concerns. • DeepSOZ (TUH scalp EEG): seizure AU-ROC ~0.92; sensitivity 81%; FPR 0.44 min/h; patient-level SOZ localization accuracy ~74%; robust multi-task, uncertainty quantification; validated for focal epilepsy. • Artifact management: ICA+PPAF preprocessing improved detection accuracy (up to 99% in tested datasets); automated neonatal ECG/pulse artifact removal achieved accuracy 0.99, sensitivity 0.93, artifact suppression ~98%; GAN-guided CNN+Transformer (GCTNet) enhanced denoising (e.g., EMG RRMSE −11.15%, SNR +9.81%). • Multimodal wearable frameworks (bte-EEG+ECG) increased sensitivity (from 59.1% to 62.2%) and reduced FAR (6.5/day to 2.4/day); spatial resolution limitations persist. Predictive AI (seizure forecasting and outcomes): • LSTM-based seizure prediction using inter-channel correlation and graph features: low FPR (~0.02–0.11/h) with high sensitivity, outperforming traditional ML and CNN baselines. • CNN with patient calibration (CHB-MIT/Conegliano): leave-one-patient-out accuracy improved from ~54% baseline to ~66–69% with 1–2 seizure calibrations; sensitivity rose to ~63–70%; requires ≥1 seizure per patient and substantial preictal data; variable patient gains. • LRCN early prediction: accuracy 93.40%, sensitivity 91.88%, specificity 86.13%. • Tiny 1D-SCNN (wearable-oriented): sensitivity 94.44%, FPR 0.011/h, AUC 0.979. Therapeutic efficacy prediction: • Drug response: GBDT-LZC (OXC) accuracy 81%, sensitivity 91%; GBDT-KC accuracy 82%, sensitivity 83%. • LEV response in TLE predicted via 19-channel EEG ML; pediatric absence epilepsy VPA response predicted with KNN (theta power): sensitivity 92.31%, specificity 76.92%, accuracy 84.62%, AUC 88.46%. • XGBoost model using demographics/EEG/MRI: remission vs no-remission—F1 0.947, AUC 0.979. • Surgical outcome: DL-assisted pathological HFO purification improved postoperative outcome prediction; multimodal neuroimaging+v-EEG yielded significant prognostic value (hazard ratio 11.4; CI wide 2.249–57.787). Overall, AI-EEG improves precision, efficiency, and enables multimodal fusion/personalization; clinician verification remains essential.
Findings support the central hypothesis that AI augments EEG-based epilepsy care across detection, localization, monitoring, and prediction. Supportive AI demonstrably reduces manual workload and increases consistency and specificity in interpretation (e.g., SCORE-AI), while predictive AI identifies subtle preictal patterns enabling early warnings and forecasts of treatment outcomes. Nevertheless, clinical utility hinges on integrating AI outputs with comprehensive clinical data and physician oversight due to challenges in generalizability, artifacts, sampling constraints, hardware limits, and explainability. The significance lies in shifting epilepsy management from reactive to proactive paradigms via real-time analysis, multimodal fusion, and personalized predictions. Bridging the lab-to-clinic gap requires algorithm optimization (including efficiency improvements), robust data pipelines, explainable AI to enhance trust, and interdisciplinary collaboration to standardize and validate systems in diverse patient populations.
This systematic review synthesizes recent advances in AI applications for EEG in epilepsy, organizing them into supportive and predictive AI paradigms. AI-EEG enhances automated detection and localization, enables long-term remote monitoring, and improves seizure prediction and therapeutic efficacy assessment. The paper underscores the necessity of clinician validation and highlights persistent barriers: data quality and generalization, artifact/noise interference, hardware constraints, and limited model explainability. Future research should focus on algorithmic efficiency (pruning, quantization, distillation), self- and semi-supervised learning to address data scarcity, multimodal fusion (EEG with fMRI/ECG and wearables), explainable AI integration, standardized acquisition/validation protocols, and large multicenter prospective trials. Advancing low-power, high-sampling-rate, user-friendly sensing and edge computing will be pivotal for real-world deployment. Ethical, privacy, and regulatory frameworks must evolve in parallel to ensure fair, secure, and accountable AI-EEG clinical applications.
AI-EEG performance is constrained by data scarcity, sample imbalance, and dataset-specific training that limits generalizability across institutions and patient subgroups (e.g., neonates, critically ill). EEG signals are susceptible to physiological artifacts (EMG, ocular, cardiac, motion) and environmental noise; portable devices often have insufficient sampling rates (typically 128–256 Hz) that preclude reliable HFO capture (≥2 kHz required). Hardware and computational limits restrict model complexity, frequency resolution, and deployment feasibility, especially in wearables and resource-limited settings. Dominant deep models are black boxes, reducing clinical trust due to poor interpretability. Patient variability and dynamic EEG changes further challenge model robustness. Some reported high AUCs may not translate to clinical precision; specificity can be limited and performance variable across patients. The review’s evidence base, while broad, relies on heterogeneous study designs and datasets; many promising methods remain at experimental or single-center stages without large-scale prospective validation.
Related Publications
Explore these studies to deepen your understanding of the subject.

