logo
ResearchBunny Logo
Introduction
Remote photoplethysmography (rPPG), a non-invasive method for detecting blood volume changes using video cameras, offers a promising alternative to contact-based methods. However, rPPG accuracy is affected by motion artifacts, illumination changes, and skin tone variations, introducing noise and distortions. Existing studies often focus on specific physiological parameters (like heart rate) rather than improving the raw rPPG signal itself, potentially losing valuable information. This study addresses this limitation by developing a machine learning model to reconstruct a high-quality rPPG signal that closely resembles the reference PPG signal obtained from a contact sensor. This approach aims to capture the full morphology of the PPG waveform, preserving information beyond a single physiological parameter, allowing for a more comprehensive assessment of cardiovascular health.
Literature Review
Prior research has explored improving physiological parameters derived from rPPG, such as heart rate (HR) and HR variability (HRV), by comparing them with reference cPPG data. Some studies have focused on improving blood pressure (BP) estimation, oxygen saturation measurements, or the detection of fibrillation arrhythmia. However, most previous studies have concentrated on extracting specific parameters rather than improving the overall rPPG signal quality. One study attempted to improve PPG signals by comparing them with reference PPG signals but used a private dataset and limited evaluation metrics. This research improves on these studies by using three public datasets, employing more comprehensive evaluation metrics in both time and frequency domains, and focusing on the reconstruction of the entire rPPG waveform, not just specific parameters.
Methodology
This study utilized three publicly available datasets: LGI-PPGI, PURE, and MR-NIRP indoor. These datasets contain videos of participants engaged in various activities (rest, talking, exercise, rotation, translation), along with corresponding cPPG signals from pulse oximeters. The rPPG signal extraction involved identifying regions of interest (ROI) on the face (forehead and cheeks) using the pyVHR framework and MediaPipe. Four algorithms (CHROM, LGI, POS, and ICA) were used to extract initial rPPG signals from the RGB data. Data preprocessing included detrending, bandpass filtering (0.65–4 Hz), low variance signal removal, segmentation into 10-second windows, and min-max normalization. Histogram equalization was applied to improve performance. Frequency domain analysis was performed using Welch’s method to estimate heart rate. The proposed model, trained using the PURE dataset with a 5-fold cross-validation, consists of four LSTM blocks with dropout layers, followed by a dense layer. The model takes the output from the four rPPG extraction algorithms as input and outputs a refined rPPG signal. The model was trained to minimize the root mean squared error (RMSE) and maximize the Pearson correlation coefficient (r) between the constructed rPPG and the reference cPPG signal. Evaluation metrics included dynamic time warping (DTW), Pearson’s correlation coefficient (r), RMSE, and the absolute difference in heart rate (|ΔHRI|) between the constructed rPPG and cPPG signals. Non-parametric statistical tests (Friedman and Nemenyi tests with Bonferroni correction) were used to compare the performance of the proposed model to other methods.
Key Findings
The proposed model showed significant improvements in rPPG signal accuracy compared to existing methods (LGI, CHROM, POS, and a green channel baseline) across all datasets and activities. The model consistently outperformed other methods in terms of DTW, indicating better signal morphology and alignment with the reference cPPG, even in challenging conditions (e.g., movement, varying lighting). While the Pearson correlation coefficient (r) showed improvements, the differences were not always statistically significant compared to CHROM or POS for all datasets, particularly MR-NIRP. This discrepancy is attributed to differences in signal alignment, addressed by the DTW metric. RMSE was also used, but the DTW metric provided a more insightful comparison of signal morphology. The model achieved a high correlation coefficient (r) of up to 0.84 and 0.77 for translation and rotation activities, respectively, across all datasets. The absolute difference in heart rate estimation (|ΔHRI|) showed that the model performs best for PURE and MR-NIRP datasets, although for LGI-PPGI dataset, the results were comparable to POS. Visual inspection of the rPPG signals confirmed that the proposed model generated more robust and less noisy signals compared to other methods, even when RMSE was slightly higher. The proposed model was shown to be robust across different activities, even those involving significant subject movement (e.g., talk, rotation).
Discussion
The results demonstrate the effectiveness of the proposed machine learning approach in improving rPPG signal quality. Unlike many previous studies that focused on a single physiological parameter, this study aimed to reconstruct the complete rPPG waveform, preserving potentially crucial information for cardiovascular health assessment. The use of DTW, in addition to correlation and RMSE, provides a robust evaluation of signal similarity even when temporal alignment varies. The superior performance of the proposed model across multiple datasets and activities underscores its generalizability and robustness. The consistent improvement in DTW values, despite some variability in the correlation coefficient and RMSE, highlights the importance of considering signal morphology beyond simple correlation measures. The model’s performance on out-of-distribution datasets (LGI-PPGI and MR-NIRP) indicates its ability to generalize to diverse recording conditions and subject characteristics.
Conclusion
This study successfully developed a machine learning-based method for constructing high-quality rPPG signals from video data, outperforming existing methods. Future work should explore the estimation of additional physiological parameters (e.g., beat-to-beat HRV, oxygen saturation) and investigate the model’s robustness under various recording conditions (e.g., lighting, camera type). The proposed method presents a valuable tool for contact-free health monitoring with significant potential in applications such as telemedicine, driver monitoring, and biometric authentication.
Limitations
The model was primarily trained on the PURE dataset. While it showed good performance on out-of-distribution datasets, further evaluation with more diverse datasets is warranted to ensure robust generalization. The current evaluation focuses primarily on heart rate; future studies should explore other physiological parameters. The use of publicly available datasets limited control over data acquisition parameters. Additional analysis might be required to completely resolve the minor discrepancies observed between DTW and RMSE values in some cases.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny