Introduction
The proliferation of electronic health record (EHR) systems has created vast amounts of data suitable for machine learning (ML) applications in digital health. However, the privacy and sensitivity of EHR data pose significant challenges to traditional ML approaches that require data centralization. Federated learning (FL) offers a promising solution by enabling distributed ML model training without sharing raw data. While FL is effective in principle, its performance can be significantly degraded when dealing with non-independently and identically distributed (non-IID) and unbalanced EHR data. This non-IID nature can lead to reduced model effectiveness and decreased incentive for institutions to participate in FL training. This research explores this problem using an in-hospital mortality prediction task on a real-world multi-center ICU EHR database, preserving the original non-IID and unbalanced data distribution. The goal is to understand the performance degradation of baseline FL under this data scenario and propose a solution. The authors argue that a unified model is unsuitable for this heterogeneous data environment and propose a personalized approach to address the limitations of standard FL.
Literature Review
The paper reviews existing FL optimization techniques, categorizing them into global optimization and local adaptation methods. Local adaptation methods, particularly personalized federated learning (PFL), are highlighted as effective solutions for handling non-IID and unbalanced data. The authors discuss various PFL strategies, including model fine-tuning, local loss regularization, meta-learning, multi-task learning, transfer learning, and federated knowledge distillation (FKD). They also review FKD methods, including federated distillation (FD) and hybrid FD (HFD), noting their limitations in balancing communication cost and model accuracy. Finally, the paper contrasts its approach with existing federated AutoML research, primarily focusing on neural architecture search (NAS), emphasizing that their work is not limited to NAS or deep neural networks.
Methodology
The authors redefine the standard FL optimization problem to focus on generating optimal unique model structures and parameter sets for each individual participant (Equation 2). Their proposed method, POLA, is a two-step, one-shot PFL approach. Step 1 involves an adjusted FedAvg algorithm (Algorithm 3) for FL training, incorporating validation data to select a well-performing teacher model with good generalization. Step 2 involves parallel local adaptation for each institution using a Genetic Algorithm (GA) to optimize personalized model structures and hyperparameters. The local adaptation employs knowledge distillation, treating the shared model as the teacher and the personalized models as students. Both output and feature distillation are used (Equations 3-7), combined with a binary cross-entropy loss function for the hard target. The GA optimizes the model personalization, including structure design and hyperparameter selection, using the inverse of model validation error as the fitness function. The eICU Collaborative Research Database (eICU-CRD v2.0) is used, with data preprocessing steps including cohort selection, variable selection (Table 1), and variable preprocessing. The data is divided into IID, hospital-based non-IID, and unit-type-based non-IID distributions (Table 2, Figure 2) to simulate various non-IID scenarios. Multilayer Perceptrons (MLPs) are used as the ML model, with a unified design for FedAvg and a personalized design for POLA (Table 3). The Area Under the Receiver Operating Characteristic Curve (AUROC) is used as the evaluation metric.
Key Findings
Experiments comparing POLA with baseline FL (FedAvg) and two other PFL methods (FT-FedAvg and pFedMe) showed that POLA significantly outperforms the others in both prediction accuracy (AUROC) and convergence speed across various data distributions (Figure 3, Figure 4, Table 4). The impact of data distribution on the baseline FL algorithm was demonstrated: while FL performed better than local training in less skewed non-IID data, it failed to converge in highly skewed non-IID data (Figure 3). POLA's superior performance is attributed to its ability to balance global generalization knowledge and local data specificity. Analyzing individual ICU center results (Figure 5) reveals that POLA is particularly effective when the amount of local data is sufficient and non-IID skewness is high, significantly improving model performance across all unit-type-based centers but only a majority of hospital-based centers. Compared to pFedMe, POLA achieves comparable performance but with significantly fewer communication rounds, improving computational and communication efficiency.
Discussion
The effectiveness of POLA depends on the quality of the teacher model selected during FL training; a well-performing teacher model is crucial for optimal personalized model generation. However, the study found that simply selecting a high-performing teacher model isn't sufficient; it needs to align with the parameter update direction of all student models. The authors discuss the compatibility and extensibility of POLA. POLA's application is potentially extendable to other cross-silo scenarios (e.g., biomedical, financial) and to other machine learning models, particularly deep learning models, where its potential for performance gains and communication overhead reduction is even greater.
Conclusion
The paper concludes that POLA effectively addresses the challenge of performance degradation in FL under non-IID and unbalanced data, achieving superior personalized models with reduced communication overhead in a real-world multi-center ICU setting. Future research could explore further optimization of the teacher model selection process, expand the range of applicable ML models, and apply POLA to different healthcare or other domains.
Limitations
The study's limitations include the reliance on a specific EHR database (eICU-CRD) and the use of MLPs as the ML model. The performance of POLA may vary with different datasets or model architectures. The hyperparameter settings for POLA were empirically determined, and further optimization may be possible. The generalizability of the findings may be affected by the specific characteristics of the eICU-CRD database.
Related Publications
Explore these studies to deepen your understanding of the subject.