logo
ResearchBunny Logo
Towards better heartbeat segmentation with deep learning classification

Computer Science

Towards better heartbeat segmentation with deep learning classification

P. Silva, E. Luz, et al.

This paper presents an innovative real-time method for validating heartbeat segmentation using convolutional neural networks (CNNs), designed to minimize false positive alarms. With application evaluations on the MIT-BIH and CYBHI databases, conducted by Pedro Silva, Eduardo Luz, Guilherme Silva, Gladston Moreira, Elizabeth Wanner, Flavio Vidal, and David Menotti, this approach shows promise for real-time applications and could potentially be integrated into dedicated hardware.

00:00
00:00
~3 min • Beginner • English
Introduction
Excessive false alarms in intensive care units reduce clinicians’ trust in monitoring equipment and can cause true critical events to be missed. Many alarm systems rely on electrocardiogram (ECG) signals, whose quality and accurate QRS (R-peak) detection are essential for downstream tasks such as heart rate estimation and arrhythmia classification. The ECG comprises several fiducial points (P, QRS, T), and errors in segmentation propagate to later processing and contribute to false alarms. While prior work often targets false alarm reduction at the classification stage, segmentation errors remain under-addressed. This work proposes reducing false positives at the segmentation stage by validating detected heartbeats using a CNN that recognizes heartbeat morphology. The goal is to improve the positive predictive value of a standard R-peak detector (Pan–Tompkins) with minimal computational overhead and in a way suitable for real-time, embedded deployment. The study evaluates the approach on an on-the-person database (MIT-BIH) and a noisier off-the-person database (CYBHi).
Literature Review
Previous research on false alarm reduction includes the PhysioNet/CinC 2015 Challenge focused on life-threatening arrhythmias, where multimodal and heuristic-rich methods achieved strong performance (e.g., Plesinger et al. using multi-channel filtering, spectral features, and rules). Signal quality assessment has been used to filter poor-quality ECG, e.g., Behar et al. extracted quality indices and trained SVMs, showing reductions in false alarms across several databases (CinC 2011, MIT-BIH, MIMIC II). Multimodal approaches combined multiple ECG leads with invasive blood pressure, PPG, or ABP, employing quality indices and Kalman filtering, and showed robust peak detection (e.g., evaluated on PhysioNet Challenge 2015). Deep learning, especially CNNs, has been widely used for arrhythmia classification from ECG, but here CNNs are leveraged differently—as validators of QRS detections—to address false positives originating in segmentation rather than classification. Pan–Tompkins remains a popular, low-cost QRS detector in both academia and industry and serves as the third-party detector to be validated.
Methodology
The proposed pipeline comprises: (1) database split into subject-disjoint training and testing; (2) pre-processing to segment signals into fixed-length windows and standardize input; (3) data augmentation; (4) third-party R-peak detection; (5) CNN-based validation of detected R-peaks; and (6) evaluation. - Data and pre-processing: Inputs are 833 ms segments. For 360 Hz signals (MIT-BIH), this is 300 samples; higher-rate signals (e.g., CYBHi at 1 MHz) are resampled/reshaped to 300 samples via polynomial interpolation. No specific filtering is applied before CNN. - Data augmentation for positives: eight schemes applied around annotated R-peaks: (1) centered R-peak; (2) shift ±5 samples; (3) shift ±10; (4) shift ±15; (5) attenuate P-wave (375 ms before R) by 30%; (6) attenuate T-wave (375 ms after R) by 30%; (7) reduce entire segment amplitude by 20%; (8) reduce entire segment by 40%. - Negative sample construction: For each pair of consecutive R-peaks, exclude the 50 samples after the first and 50 samples before the second; within the remaining interval, slide a window with stride 5 samples to extract negatives (overlapping allowed). This biases negatives toward non-QRS content and noise, especially sensitive P/T regions. - Third-party R-peak detector: Pan–Tompkins algorithm detects R-peaks using band-pass filtering (low/high-pass composition), differentiation, squaring, moving window integration, and adaptive dual thresholds with periodicity constraints. It outputs R-peak locations and a delay window. - CNN architecture: Seven-layer network with four 1D convolutional layers (filter sizes 1×49, 1×25, 1×9, 1×9; stride 1; no padding), each followed by max-pooling (size/stride 2), then two fully connected layers, a dropout layer, and a final 2-unit softmax for binary classification: heartbeat centered vs. not centered. - Training: Inputs are 833 ms windows (300 samples for MIT-BIH; downsampled CYBHi). Optimizer: SGD with momentum 0.9; loss: binary cross-entropy with softmax. Learning rate schedule: 0.01 (epochs 1–3), 0.005 (next 7), 0.001 (next 10), 0.0001 (final 10), total 30 epochs. Train/validation split per subject record: 70%/30% (within training subjects). Data augmentation greatly increases positive samples (e.g., MIT-BIH positives from 16,647 to 183,117; CYBHi from 9,414 to 103,554), while negatives remain unchanged. - Validation step: For each R-peak detected by Pan–Tompkins, a centered 833 ms segment is fed to the CNN; the R-peak is accepted only if the CNN predicts a heartbeat centered within the tolerance defined by augmentation shifts. - Evaluation metrics: Sensitivity (Se = TP/(TP+FN)), Positive Predictive (+P = TP/(TP+FP)), and F-Score (harmonic mean of Se and +P). True/False positives/negatives are defined with respect to centered R-peak detection within allowed shift window.
Key Findings
- Performance versus Pan–Tompkins (baseline) on test sets: • MIT-BIH: Baseline Se 95.79%, +P 97.84%, F-Score 0.97; Proposed Se 92.98%, +P 100.00%, F-Score 0.96. • CYBHi: Baseline Se 96.95%, +P 90.28%, F-Score 0.93; Proposed Se 95.71%, +P 96.77%, F-Score 0.96. - The CNN validator markedly increases Positive Predictive value (reduces false positives) with a modest decrease in sensitivity (increases false negatives). F-Score is maintained or improved on the noisier CYBHi set. - Computational feasibility: Average per-inference time over 100 runs: CPU (Intel i7 8th gen) 33 ms; GPU 10 ms; NVIDIA Jetson Nano 33 ms. Considering physiological minimum RR interval (~200 ms), inference latency is compatible with real-time operation. Jetson Nano achieves similar throughput to CPU at substantially lower energy (~10× efficiency). - Data augmentation enabled sufficient positive samples for effective CNN training (e.g., MIT-BIH positives increased from 16,647 to 183,117), and first-layer filters adapt to database characteristics, with broader responses for noisier CYBHi signals.
Discussion
Validating R-peak detections with a CNN substantially reduces false positives from a conventional detector, improving positive predictive value and thus the reliability of downstream analyses and alarms. The trade-off is a reduction in sensitivity, as the CNN relies on waveform morphology; high-frequency noise and morphology distortion (especially in P and T waves) can cause true beats to be rejected, particularly in off-the-person CYBHi data. Despite this, the F-Score remains comparable or improved (notably on CYBHi), indicating overall robust performance. The approach generalizes across on-/off-the-person databases and, by acting only on candidate detections (rather than scanning the full signal), keeps computational demands low and is suitable for embedded deployment. Application-dependent tuning may balance the +P vs. Se trade-off. Hardware acceleration (GPU/Jetson/FPGA) further supports real-time integration into medical cyber-physical systems.
Conclusion
The study introduces a real-time, CNN-based validator for heartbeat (R-peak) detections that improves positive predictive value of a standard Pan–Tompkins detector on both controlled (MIT-BIH) and noisy off-the-person (CYBHi) data, with only a slight decrease in sensitivity. Contributions include: (i) an efficient heartbeat pattern classifier to improve segmentation; (ii) a tailored CNN architecture and augmentation strategy; and (iii) a practical embedded-system integration pathway. Results demonstrate feasibility for real-world use, delivering more trustworthy detections for subsequent analysis. Future work will explore: learning or designing filters to mitigate high-frequency noise (especially for off-the-person signals); transfer learning/fine-tuning to enhance generalization without sacrificing +P; and extending the model to recognize multiple heartbeat classes, including arrhythmic morphologies.
Limitations
- Trade-off between +P and Se: reducing false positives increases false negatives, which may be unacceptable in some applications. - Morphology sensitivity: High-frequency noise and waveform distortion (notably P/T waves) degrade CNN decisions, particularly in off-the-person data. - Training scope: Model trained to detect normal heartbeat patterns; arrhythmic or irregular beats with atypical morphology may be misclassified. - Dependence on third-party detector: The validator only operates on detected candidates; missed detections by the base algorithm cannot be recovered. - Data constraints: CYBHi lacked provided R-peak annotations; authors created labels and discarded 12 excessively noisy records, which may affect generalizability. - Evaluation uses healthy subset of MIT-BIH to avoid arrhythmia impacts; performance on pathological rhythms remains to be fully characterized.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny