
Medicine and Health
Dynamic memory to alleviate catastrophic forgetting in continual learning with medical imaging
M. Perkonigg, J. Hofmanninger, et al.
Discover how Matthias Perkonigg, Johannes Hofmanninger, Christian J. Herold, James A. Brink, Oleg Pianykh, Helmut Prosch, and Georg Lang are revolutionizing medical imaging with a dynamic memory approach to continual learning, maintaining performance despite evolving technologies. This paper sheds light on overcoming domain shifts and improving cardiac segmentation and lung nodule detection. Don't miss out on these advancements!
Playback language: English
Introduction
Deep learning (DL) algorithms are revolutionizing medical imaging, offering superior performance in tasks like segmentation and classification compared to human experts. However, the dynamic nature of clinical imaging, with continuous advancements in technology, scanner variations, and evolving protocols, poses a significant challenge to the sustainability of these DL models. Once trained, these models often become outdated and struggle to adapt to new data distributions, a phenomenon known as dataset shift. This limitation severely restricts the practical applicability of DL in clinical settings.
Dataset shifts, particularly domain shifts (or acquisition shifts), arise from differences in training and inference data distributions. In medical imaging, these shifts frequently occur due to variations in scanner technology, manufacturer, generation, or imaging protocols. The data acquired often originates from diverse sources, creating a heterogeneous data stream. To maintain the relevance of deployed DL models, continual learning methods are essential, allowing models to adapt to these shifts while avoiding catastrophic forgetting – the loss of previously learned knowledge when adapting to new data.
Continual learning, also referred to as lifelong learning, focuses on techniques that enable models to learn new tasks (or domains) incrementally without compromising performance on previously learned tasks. A core challenge in continual learning is to prevent catastrophic forgetting, where the acquisition of new knowledge overwrites existing knowledge, leading to significant performance decline on past tasks. Ideally, continual learning should lead to positive backward transfer – improved performance on older tasks due to the increased variety of training examples encountered.
This research introduces dynamic memory (DM) as a novel continual learning approach to handle the emergence of new data sources at unknown time points in a continuous stream of medical images. DM is a rehearsal method that maintains a small, diverse subset of the data stream in memory to mitigate catastrophic forgetting. The method employs a style metric to ensure that the remembered data maintains a variety of styles observed in the continuous data stream. Optionally, a pseudo-domain (PD) model is used to detect clusters of similar style within the data stream. These pseudo-domains act as proxies for the unknown real domains, enabling a more balanced memory and training process, leading to improved adaptation and knowledge retention. The robustness and generalizability of this approach are demonstrated through its application to two distinct tasks: cardiac segmentation in magnetic resonance imaging (MRI) and lung nodule detection in computed tomography (CT). The focus is not on achieving state-of-the-art results for each specific task, but rather on showcasing the effectiveness of the continual learning method in adapting to a continuous stream of imaging data with domain shifts, without relying on explicit domain knowledge.
Literature Review
The field of continual learning has seen several approaches to address the problem of catastrophic forgetting. These can be broadly categorized into rehearsal-based methods, regularization-based approaches, and parameter isolation methods. Rehearsal methods, like experience replay, involve storing and periodically revisiting samples from previous tasks to reinforce learning. Regularization-based methods, such as Elastic Weight Consolidation (EWC), aim to constrain the changes to model parameters during learning of new tasks, preserving the knowledge acquired for previous tasks. Parameter isolation methods, such as those based on growing neural networks or learning independent parameter sets for each task, aim to prevent interference between tasks by using independent parameter sets or architectures.
Existing methods often assume incremental task learning – learning new tasks sequentially. However, in real-world medical image analysis, domain shifts occur unpredictably, with unknown domain memberships. This is where the proposed dynamic memory stands out, as it operates without explicit domain knowledge. Methods like Gradient Episodic Memory (GEM) and experience replay with Maximally Inferred Retrieval (ER-MIR) require domain label information, making them unsuitable for the continuous data stream scenario addressed in this study. Domain adaptation (DA) techniques also address domain shifts, but they typically assume access to both source and target domains simultaneously, contrasting with the continuous learning paradigm where new data arrives sequentially. Transfer learning, while related, focuses on optimizing a model solely for the new domain, neglecting performance on previous domains. This paper addresses a gap in the existing literature by proposing a continual learning method suitable for the realistic scenario of continuously evolving data streams in medical imaging, without explicit domain knowledge.
Methodology
The core of the proposed method is the dynamic memory (DM), a rehearsal-based continual learning approach. DM maintains a fixed-size memory M containing image-target pairs. When a new image-target pair (b, c) arrives from the data stream, it's added to M. To maintain diversity, the algorithm replaces an existing element in M with the one that is most similar in style to the new sample using a style metric. This style metric leverages Gram matrices derived from the activations of a pre-trained style network (ResNet-50 pre-trained on ImageNet). The Gram matrix captures high-level style information of the image and the distance between Gram matrices of different images measures stylistic similarity. The sample with the smallest Gram distance to the incoming sample is replaced, thus preserving style diversity in the memory.
To enhance the balancing and representation of the data within the memory and the training process, an optional pseudo-domain (PD) module is incorporated. The PD module employs Isolation Forests (IF) as one-class anomaly detectors. Gram matrices are first embedded into a lower-dimensional space using Sparse Random Projection (SRP) for efficient computation. Each IF identifies a pseudo-domain, which represents a cluster of similar style. An incoming sample is assigned to the pseudo-domain whose IF gives the highest decision function value. Samples not fitting to any existing pseudo-domain are stored in an outlier memory O. New pseudo-domains are created when a sufficient number of outliers cluster together. This procedure results in more balanced representation across different styles in both the memory and the training batches.
Two experiments were conducted: cardiac segmentation and lung nodule detection. For cardiac segmentation, a fully convolutional ResNet-50 network was used, and for lung nodule detection, Faster R-CNN with a ResNet-50 backbone was employed. In both experiments, performance was evaluated using appropriate metrics (Dice score for segmentation and average precision for detection). The performance of DM and DM-PD were compared against several baseline methods, including naive continual learning, random memory replacement, EWC, GEM, and ER-MIR. Comparisons also included a joint model (trained on all data) and domain-specific models (trained separately on each domain) for benchmarking. Backward transfer (BWT) and forward transfer (FWT) were used to quantify the impact of learning new domains on the performance of previously learned domains, helping to assess the degree of catastrophic forgetting and knowledge transfer.
Datasets: Cardiac segmentation used a multi-center, multi-vendor dataset with data from four vendors (Siemens, GE, Philips, Canon), each considered a domain. Lung nodule detection utilized data from the LIDC-IDRI database and the LNDb challenge, creating four domains based on scanner vendor and reconstruction kernel combinations.
Key Findings
The results of the cardiac segmentation experiment showed that both DM and DM-PD significantly outperformed baseline methods (naive continual learning, random memory replacement, EWC). DM and DM-PD exhibited negligible catastrophic forgetting (neutral BWT values), while other methods showed significant forgetting (negative BWT values). The performance on under-represented domains was also considerably better for DM and DM-PD. Qualitative analysis confirmed the superior segmentation results of DM and DM-PD.
In the lung nodule detection experiment, DM and DM-PD again outperformed the naive approach and random memory replacement. DM-PD showed superior performance to DM, highlighting the benefit of the pseudo-domain balancing mechanism, achieving the highest average precision and FWT. DM and DM-PD maintained good performance across domains, while the naive approach suffered from catastrophic forgetting. A comparison with precision-recall curves visually demonstrated DM and DM-PD's superior performance and fewer false positives compared to the naive method.
Analysis of the memory composition revealed that DM-PD maintains a more balanced representation of different domains compared to DM. TSNE visualization of Gram matrix embeddings showed that DM-PD achieved a more uniform distribution of memory elements across the various domains, confirming that the pseudo-domain module successfully balanced the memory content.
Discussion
The results consistently demonstrated the effectiveness of the dynamic memory (DM) approach in alleviating catastrophic forgetting and enabling continual learning in medical image analysis. The proposed method effectively adapts to changing data distributions caused by domain shifts while retaining the ability to process data from previously encountered domains. The incorporation of a pseudo-domain detection module (DM-PD) further enhanced performance by ensuring a more balanced and representative memory, leading to improved overall accuracy and reduced false positives in the detection task.
The findings address a critical limitation of existing deep learning models in medical imaging – their inability to adapt to continuous technological advancements and varying data acquisition protocols. The proposed approach moves towards building more sustainable and robust DL systems for clinical applications. The ability of the model to transfer knowledge across scanners, evident in both forward and backward transfer metrics, highlights the generalizability and robustness of the learned feature representations.
Conclusion
This paper introduces dynamic memory (DM), a novel continual learning method that effectively addresses the challenge of catastrophic forgetting in medical imaging by maintaining a diverse rehearsal memory. The optional incorporation of a pseudo-domain detection module further enhances performance by balancing the memory and training process. The results, consistent across two different tasks and modalities, show that DM offers a significant improvement over baseline methods. This work represents a substantial step toward creating more adaptable and sustainable deep learning systems for real-world medical image analysis. Future work could explore the scalability of DM to a larger number of domains and investigate the integration of active learning to optimize annotation efforts in clinical settings.
Limitations
While the proposed DM method shows promise, several limitations exist. First, further research is needed to rigorously evaluate the scalability of the method to a significantly larger number of domains, ensuring the absence of catastrophic forgetting as the number of scanners and variations increases. Second, the method requires storing a subset of images in memory, which, while significantly smaller than the entire dataset, could pose storage or privacy concerns in practical clinical deployments. Third, the study assumes the availability of labeled data for all samples in the continuous stream; in real-world clinical settings, an active learning approach may be necessary to efficiently acquire annotations for new domains.
Related Publications
Explore these studies to deepen your understanding of the subject.