logo
ResearchBunny Logo
Dynamic memory to alleviate catastrophic forgetting in continual learning with medical imaging

Medicine and Health

Dynamic memory to alleviate catastrophic forgetting in continual learning with medical imaging

M. Perkonigg, J. Hofmanninger, et al.

Discover how Matthias Perkonigg, Johannes Hofmanninger, Christian J. Herold, James A. Brink, Oleg Pianykh, Helmut Prosch, and Georg Lang are revolutionizing medical imaging with a dynamic memory approach to continual learning, maintaining performance despite evolving technologies. This paper sheds light on overcoming domain shifts and improving cardiac segmentation and lung nodule detection. Don't miss out on these advancements!... show more
Introduction

The study addresses how to sustain and adapt deep learning models for medical imaging in the presence of domain shifts that arise from evolving scanners, acquisition protocols, and diagnostic workflows. Traditional static training leads to performance deterioration when new data distributions appear, while naive continual training risks catastrophic forgetting of earlier domains. The research question is whether a domain-agnostic continual learning strategy can adapt to unknown, time-varying domains encountered in a continuous data stream, while preserving performance on previously seen domains. The purpose is to develop and evaluate a memory-based rehearsal approach that detects and maintains diversity across emerging styles, enabling robust performance across multiple scanners without requiring explicit domain labels. This is important for clinical deployment where metadata is inconsistent and data arrives sequentially.

Literature Review

The paper situates the work within continual learning research aimed at mitigating catastrophic forgetting through three broad families: rehearsal and pseudo-rehearsal, regularization-based methods (e.g., EWC), and parameter isolation approaches. Prior continual learning efforts often focus on incremental task learning, whereas here the emphasis is on domain shifts. Related areas include domain adaptation, which typically assumes simultaneous access to source and target domains and known domain labels, assumptions misaligned with continuous clinical data streams. Transfer learning focuses on adapting to a new domain/task via fine-tuning, usually disregarding performance on the initial domain. In medical imaging, scanner-induced variability has been shown to degrade radiomics and ML features, and harmonization methods exist but assume batch availability of multi-scanner data. The study builds on rehearsal-based methods and introduces style-based selection to handle domain shifts without domain labels.

Methodology

Problem setup: A task model is base-trained on images from a single domain (scanner) and then updated continuously as a data stream introduces unseen domains at unknown times. The goal is to adapt to new domains while preventing catastrophic forgetting of earlier domains. Dynamic Memory (DM): Maintain a fixed-size memory M of image–target pairs to enable rehearsal. At each stream step, a new input mini-batch B is observed. Each new sample is inserted into M by replacing the most similar existing memory element in terms of style, ensuring diversity. Style similarity is computed via Gram matrices from a fixed, ImageNet-pretrained ResNet-50 style network. For an image x, Gram matrices are computed from selected convolutional layers, and a Gram distance δ(x,y) aggregates layerwise differences. Replacement rule: for new (b,c), find index i = argmin_j δ(b, m_j) and replace m_i. After updating memory, construct a training mini-batch T by including hard current samples (where the task model performs poorly per task metric) and randomly sampled memory elements until reaching size |T|=T, then update model parameters. Pseudo-domain (PD) module (optional; DM-PD): Detects style-coherent clusters (pseudo-domains) to balance memory and training. Reduce Gram matrices with Sparse Random Projection to form embeddings e(x). Maintain a set of Isolation Forest detectors D = {IF_d} for identified pseudo-domains. Assign p(x) = argmax_d IF_d(e(x)) if max score > 0; otherwise label as outlier and store in outlier memory O. When |O| ≥ threshold, compute pairwise distances; if a dense subset exists (distances below τ), fit a new Isolation Forest to that subset to define a new pseudo-domain, transfer its samples to M, and balance M such that each pseudo-domain occupies at least ρ slots. Memory update with pseudo-domains: if |M_d| < ρ for the assigned d, replace a random element from an overrepresented domain; otherwise, apply style-based replacement within M_d. Architectures and implementation: Style network: ResNet-50 pretrained on ImageNet (fixed). Cardiac segmentation task model: fully-convolutional network with ResNet-50 backbone (FCN). Lung nodule detection: Faster R-CNN with ResNet-50 backbone. Implemented in Python 3.6, PyTorch 1.6.0/torchvision. Data and continual protocol:

  • Cardiac MRI segmentation (M&Ms challenge): multi-vendor domains Siemens (A), GE (B), Philips (C), Canon (D). Base training on scanner A, then continual stream across A→D. Labels: LV, RV, MYO on 2D slices. Splits per Table 1a.
  • Lung nodule detection: LIDC-IDRI (with LUNA16 annotations) and LNDb. Domains: GE/low-frequency (E), GE/high-frequency (F), Siemens B30f (G), LNDb (H). 2D slices with bounding boxes, nodules ≥3 mm. Splits per Table 1b. Baselines and comparators: Naive continual training (no rehearsal); Random memory replacement; EWC (segmentation only); GEM and ER-MIR (require domain labels); Joint model (JModel; static training on all data hypothetically available); Domain-specific models (DSM) trained per domain; Base model trained only on initial domain. Memory sizes M evaluated (e.g., 64–1024; full-stream size also examined). Evaluation metrics: Segmentation performance via Dice score (average over LV, RV, MYO), plus backward transfer (BWT) and forward transfer (FWT). Detection via Average Precision (VOC 11-point, IoU ≥ 0.3), with BWT and FWT analogs to assess transfer effects. Validation curves tracked over training to visualize forgetting/adaptation dynamics.
Key Findings

Cardiac MRI segmentation (M=128):

  • DM and DM-PD outperformed domain-agnostic baselines (Naive, Random, EWC) and achieved performance comparable to domain-label methods (GEM, ER-MIR) while not requiring domain labels.
  • Test Dice (avg over LV, RV, MYO): • DM: A 0.802±0.005, B 0.762±0.002, C 0.807±0.004, D 0.840±0.009; BWT 0.000±0.002; FWT 0.032±0.004. • DM-PD: A 0.799±0.010, B 0.763±0.004, C 0.809±0.005, D 0.844±0.010; BWT 0.003±0.004; FWT 0.031±0.005. • Random: A 0.786±0.008, B 0.746±0.008, C 0.792±0.007, D 0.850±0.005; BWT −0.011±0.007; FWT 0.033±0.003. • EWC: A 0.786±0.015, B 0.738±0.014, C 0.797±0.005, D 0.847±0.003; BWT −0.014±0.007; FWT 0.032±0.004. • Naive: A 0.781±0.005, B 0.726±0.007, C 0.789±0.003, D 0.848±0.001; BWT −0.018±0.123; FWT 0.033±0.002. • GEM: A 0.798±0.013, B 0.763±0.026, C 0.804±0.011, D 0.846±0.002; BWT −0.005±0.004; FWT 0.032±0.003. • ER-MIR: A 0.798±0.000, B 0.761±0.008, C 0.808±0.002, D 0.847±0.003; BWT −0.004±0.004; FWT 0.036±0.003.
  • EWC achieved high Dice on the last domain (D=0.847±0.003) but with notable forgetting (negative BWT). DM/DM-PD exhibited neutral/positive BWT, indicating no forgetting.
  • Scanner B (limited training samples) showed that DM/DM-PD retained performance without domain labels, outperforming Naive/Random/EWC. Increasing memory size improved performance; too-small memory (M=64) led to forgetting (BWT ≈ −0.005).
  • Training dynamics (validation curves) showed DM/DM-PD stabilized performance across domain transitions, unlike Naive and Random which exhibited drops indicative of forgetting.

CT lung nodule detection (AP, M=128):

  • DM-PD and DM outperformed Naive and mitigated forgetting; performance comparable to domain-label methods (GEM, ER-MIR) without requiring domain labels. • DM: E 0.722±0.020, F 0.526±0.021, G 0.592±0.041, H 0.330±0.015; BWT 0.030±0.018; FWT 0.063±0.016. • DM-PD: E 0.750±0.006, F 0.565±0.067, G 0.624±0.024, H 0.355±0.038; BWT 0.028±0.019; FWT 0.066±0.030. • Random: E 0.752±0.019, F 0.514±0.021, G 0.600±0.021, H 0.394±0.013; BWT 0.007±0.016; FWT 0.084±0.026. • Naive: E 0.682±0.014, F 0.506±0.017, G 0.561±0.020, H 0.369±0.008; BWT 0.000±0.008; FWT 0.091±0.027. • GEM: E 0.754±0.012, F 0.568±0.022, G 0.622±0.038, H 0.366±0.024; BWT 0.034±0.016; FWT 0.067±0.018. • ER-MIR: E 0.754±0.012, F 0.588±0.038, G 0.611±0.039, H 0.363±0.027; BWT 0.031±0.016; FWT 0.075±0.016. • DSM: E 0.653±0.047, F 0.441±0.074, G 0.643±0.067, H 0.454±0.096. • JModel: E 0.716±0.063, F 0.522±0.114, G 0.711±0.058, H 0.419±0.087. • Base: E 0.645, F 0.372, G 0.509, H 0.136.
  • All methods showed a performance drop on Scanner H due to a population shift (smaller mean nodule diameter, ~5.99 mm vs 8.29 mm in LIDC). DM-PD balanced memory across style clusters, improving stability and transfer (best/among best BWT and FWT) and reducing false positives compared to Naive.
  • Visualization (t-SNE of Gram embeddings) showed PD balancing yields memory elements more evenly distributed across the training style space, reducing overrepresentation of early domains.
Discussion

The proposed dynamic memory approach directly addresses continual adaptation to unknown domain shifts in clinical imaging streams without requiring domain labels. By using a style-based metric (Gram matrices from a fixed CNN) to guide memory replacement, the rehearsal buffer remains diverse, enabling the model to learn new imaging characteristics while retaining prior knowledge. The pseudo-domain module further improves performance by discovering style-coherent clusters and balancing both memory and training across them, which is especially beneficial when memory capacity is limited and domains are imbalanced. Empirically, the method mitigates catastrophic forgetting (neutral/positive BWT) and provides positive transfer across scanners (improved FWT), outperforming domain-agnostic baselines and approaching the performance of domain-label methods. Results generalize across modalities (MRI, CT) and tasks (segmentation, detection), demonstrating that maintaining a diverse rehearsal set is an effective and practical strategy for sustained model performance in dynamic clinical environments.

Conclusion

This work introduces a domain-agnostic continual learning strategy based on dynamic memory with a style-aware replacement policy and an optional pseudo-domain discovery and balancing module. Across cardiac MRI segmentation and CT lung nodule detection, the approach consistently reduces catastrophic forgetting, improves transfer, and maintains robust performance across evolving scanners without explicit domain labels. The method provides a practical path toward sustainable clinical deployment of imaging AI systems that must adapt to ongoing acquisition changes. Future work should address scalability and theoretical guarantees against forgetting with many domains, integrate privacy-preserving or compressed rehearsal mechanisms, and incorporate human-in-the-loop strategies (e.g., active learning) to manage annotation costs and adapt to population shifts.

Limitations

The approach requires storing a subset of images for rehearsal, which may raise privacy or storage concerns despite being much smaller than the full dataset. The method does not provide formal guarantees against catastrophic forgetting as the number of future domains scales substantially. The experiments assume access to labels for continual training; in practice, annotation costs and availability can be limiting, necessitating strategies such as active learning. Additionally, the method focuses on imaging style shifts and may adapt less efficiently to population shifts (e.g., changes in lesion characteristics).

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny