Computer Science

EPicker: An exemplar-based continual learning approach for knowledge accumulation in cryoEM particle picking

X. Zhang, T. Zhao, et al.

Discover how EPicker, developed by Xinyu Zhang and colleagues, revolutionizes cryo-electron microscopy particle picking by leveraging continual learning. This innovative approach not only enhances performance across new datasets but also prevents the loss of previously learned insights, allowing for improved identification of proteins, vesicles, and more.

00:00

Playback language: English

Index

Introduction

Single-particle cryo-electron microscopy (cryo-EM) is a powerful technique for determining biomacromolecule structures. A crucial initial step in this workflow is particle picking—the automated identification and location of biomacromolecules within cryo-EM micrographs. These micrographs contain various features besides the target molecules, including degraded proteins, impurities, and ice contaminations. Existing methods, like template matching (FindEM, Signature, DoGpicker, gAutoMatch) and unsupervised learning (DeepCryoPicker, DRPNet) have limitations. Template-based methods require user-defined templates, while unsupervised methods may lack robustness and generalization ability. Deep learning, particularly Convolutional Neural Networks (CNNs), offers potential for improved automation. However, most deep learning approaches rely on supervised learning, which requires extensive training datasets and are prone to catastrophic forgetting. This means that when the model is trained on a new dataset, its performance on the previously learned datasets degrades. This problem is exacerbated when dealing with an ever-increasing number of datasets, making joint training computationally expensive. Fine-tuning, while faster, leads to specific models that lose their ability to effectively identify previously learned particles (catastrophic forgetting). Continual learning, which aims to acquire new knowledge without forgetting existing knowledge, presents a solution. This paper introduces EPicker, a continual learning approach designed to overcome the limitations of existing methods and enhance the efficiency and generalization of cryo-EM particle picking.

Literature Review

Traditional particle picking methods rely on template matching or feature extraction based on predefined characteristics. These approaches often suffer from limitations in generalization ability and require significant user intervention for template preparation. More recent deep learning-based methods leverage convolutional neural networks (CNNs) for automated feature extraction, achieving better performance. DeepPicker, DeepEM, Warp, Topaz, and crYOLO are examples of CNN-based approaches that have demonstrated promising results. However, these deep learning methods usually necessitate joint training on multiple, often very large, datasets which can be computationally expensive and require significant storage capacity. While fine-tuning allows for faster adaptation to new features, it often results in catastrophic forgetting, where the model's ability to recognize previously learned particles diminishes. This paper addresses these shortcomings by proposing an exemplar-based continual learning approach that allows for efficient knowledge accumulation without losing past knowledge.

Methodology

EPicker employs an exemplar-based continual learning algorithm integrated with a CenterNet object detector. CenterNet, an anchor-free one-stage object detection network, was chosen for its efficiency and suitability for cryo-EM particle picking. The core of EPicker's continual learning strategy lies in a dual-path network architecture. This network has two branches (A and B): branch A is fixed and retains the knowledge from previously learned datasets, while branch B is trained on both an exemplar dataset (a small subset of previously seen data) and the new dataset. This dual-path approach ensures knowledge preservation while learning new features. A three-component loss function guides the training of branch B. The object detection loss (LOD) minimizes errors in object localization and size estimation. The knowledge distillation loss (LDistill) minimizes differences between features and heatmaps extracted from branches A and B, preventing catastrophic forgetting. The regularization loss (LReg) prevents overfitting to the exemplar dataset. The total loss function is a weighted combination of these three components. EPicker uses a DLA-34 network as its feature extraction sub-network, which shows better performance than ResNet for particle picking. Several optimizations have been implemented, such as turning off size prediction (only predicting location for computational efficiency) and downsampling the micrographs for faster processing while maintaining accuracy. The continual learning process allows the gradual addition of new datasets, enabling continuous improvement of the particle picking pipeline. Experiments compare continual learning to joint training and fine-tuning, using various datasets with diverse features (ribosomes, proteasomes, etc.). EPicker's ability to handle general biological objects, including fibers and vesicles, is also demonstrated. A fiber picking and tracing algorithm is used for fibers. For vesicles, EPicker predicts both center and size. The methodology also includes details on sparse annotation and a discussion of the specific loss function components.

Key Findings

EPicker's continual learning approach successfully accumulated knowledge from new datasets without significant catastrophic forgetting. Experiments using multiple datasets demonstrated that adding new datasets incrementally caused only a small decrease in average precision (AP) and little change in average recall (AR) compared to joint training, representing the upper bound of performance. The study defined complexity to evaluate the relationship between the dissimilarity of features and the forgetting rate showing that adding new datasets with different features doesn't cause significant forgetting. Continual learning significantly reduced the time and storage costs compared to joint training. The fine-tuning method, in contrast, demonstrated clear catastrophic forgetting, losing significant ability to identify particles from previous datasets. The continual learning and joint training models produced unbiased picking results for particles with diverse features, while fine-tuning resulted in biased picking, focusing mainly on the most recently learned features. EPicker successfully demonstrated its ability to pick general biological objects, including fibers and vesicles, by adapting its training algorithm accordingly. The algorithm for fiber picking included a line tracing algorithm to connect detected points. For vesicles, the radius was also predicted. Sparse annotation is also supported where only a small percentage of the particles are manually labelled.

Discussion

The results demonstrate that EPicker's exemplar-based continual learning approach effectively addresses the challenges of catastrophic forgetting and computational cost associated with traditional deep learning approaches for cryo-EM particle picking. The dual-path network architecture and the carefully designed loss function are crucial for the success of this approach. The ability of EPicker to handle diverse biological objects and support sparse annotation makes it highly valuable for diverse cryo-EM applications and automated pipelines. The unbiased nature of the picking in continual learning and joint training is crucial for unbiased initial particle identification for new samples. In contrast to the biased results with fine-tuning, which can be useful for targeted particle selection in specific cases. EPicker offers a practical and adaptable solution that balances model generalization, computational efficiency, and the continuous integration of new knowledge during the cryo-EM particle picking process. The algorithm's ability to handle sparse annotations makes it suitable for large datasets where full annotation is impractical.

Conclusion

EPicker provides a novel and efficient solution for cryo-EM particle picking. The exemplar-based continual learning approach allows for continuous model improvement without the computational burden and catastrophic forgetting associated with traditional methods. The ability to process various biological objects, handle sparse annotation, and offer a choice of training modes (continual learning, joint training, fine-tuning) adds to its versatility. Future research could focus on further optimization of the loss function, exploration of more advanced continual learning techniques, and integration with other cryo-EM software pipelines. The code is publicly available.

Limitations

While EPicker demonstrates significant improvements, some limitations exist. The performance of continual learning may depend on the similarity of features between old and new datasets, and the choice of the exemplar dataset size is empirical. The effectiveness of sparse annotation relies on the selection of appropriate thresholds in the loss function. Furthermore, the current implementation focuses on 2D micrographs. Extension to 3D data might be a future research direction. More extensive testing across a wider range of datasets and sample types is also warranted.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Interpretable early warning recommendations in interactive learning environments: a deep-neural network approach based on learning behavior knowledge graph

X. Xia and W. Qi

Biology

A deep learning approach for morphological feature extraction based on variational auto-encoder: an application to mandible shape

M. Tsutsumi, N. Saito, et al.

Education

Towards an intelligent blended system of learning activities model for New Zealand institutions: an investigative approach

A. Adel and J. Dayan

Medicine and Health

A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

C. Wang, H. Tachimori, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny