logo
ResearchBunny Logo
Topaz-Denoise: general deep denoising models for cryoEM and cryoET

Biology

Topaz-Denoise: general deep denoising models for cryoEM and cryoET

T. Bepler, K. Kelley, et al.

Discover Topaz-Denoise, a revolutionary deep learning method designed by Tristan Bepler, Kotaro Kelley, Alex J. Noble, and Bonnie Berger, that enhances the signal-to-noise ratio of cryoEM images, allowing for clearer micrograph interpretation and accelerated data collection. This innovative approach makes it possible to solve complex 3D structures with ease!... show more
Introduction

The study addresses the pervasive challenge of extremely low SNR in cryoEM micrographs, which hampers particle visualization, orientation completeness, and downstream processing, especially for small or non-globular proteins. Conventional filtering methods (downsampling, bandpass, Wiener filtering) do not model the true noise and often yield limited interpretability. The authors aim to develop a robust, general-purpose denoising approach that learns directly from real cryoEM data without requiring ground truth. Leveraging the Noise2Noise paradigm and the fact that movie frames provide independent noisy observations of the same signal, they train deep models to improve SNR and interpretability in cryoEM and cryoET, with the goals of enabling more complete particle picking, improving reconstructions, and reducing electron dose to increase data collection throughput.

Literature Review

Classical contrast enhancement in cryoEM relies on downsampling, bandpass, and Wiener filtering, which neglect the complex image formation and noise properties. Deep learning-based denoisers have shown strong restoration performance but typically require clean ground truth images, limiting applicability in domains like cryoEM. Noise2Noise enables training denoisers from paired noisy images without clean targets, spurring blind/self-supervised approaches (e.g., Noise2Self, Noise2Void). In cryoEM/ET, early neural denoising tools emerged for tomograms and single particle micrographs, but lacked systematic evaluation and pre-trained general models. Related software such as Warp introduced real-time preprocessing, and Topaz PU-learning improved particle picking. The study fills a gap by providing rigorously evaluated, pre-trained general denoisers for cryoEM and cryoET.

Methodology

Data preparation and pairing: The authors compiled thousands of micrograph frames from public EMPIAR datasets and internal NYSBC datasets spanning diverse instruments (FEI Krios, Talos Arctica, JEOL CRYOARM300) and DDD cameras (Gatan K2, FEI Falcon II/III) in super-resolution and counting modes across many defocus values. They formed two aggregate training sets: Large (randomly up to 200 micrographs per dataset; 3439 paired micrographs) and Small (selected by eye for strong denoising performance; 1929 pairs). For Noise2Noise supervision, movie frames were split into even/odd sets, independently aligned (MotionCor2, 5×5 patches, b-factor 100) and summed to create paired observations.

Models: Several architectures were explored: (1) U-net with one 11×11 initial convolution, five downsampling blocks with max pooling and five upsampling blocks with nearest-neighbor upsampling plus skip connections; (2) a smaller U-net with three down/up blocks; (3) an FCNN with three 11×11 conv layers (64 filters, leaky ReLU); (4) an affine model with a single 31×31 convolution (equivalent to Wiener filter under L2). Losses included L1 and L2 within the Noise2Noise objective, training to minimize error between denoised odd and raw even images (and vice versa), capturing median-seeking (L1) or mean-seeking (L2) behavior.

Training: Models were trained in PyTorch with Adagrad (lr=0.001) for 100 epochs on 800×800 random patches (minibatch 4) with 90°/180°/270° rotations and mirroring. Micrographs were normalized by subtracting mean and dividing by standard deviation. Training used a single NVIDIA V100 (32 GB VRAM), ~15 h per 2D model.

Inference: Full micrographs were denoised in 4000×4000 patches with 500-pixel padding to avoid edge artifacts. Denoising 4k×4k K2 images takes ~3 s on an NVIDIA 1080 Ti. For visualization, intensities were scaled relative to 16× low-pass filtered micrographs and mapped to 8-bit display via [-4,4] binning.

SNR quantification: Two approaches were used. (1) Region-based: manual labeling of N pairs of signal and nearby background regions (up to 10 micrographs per dataset). SNR (dB) = 10 log10(Σ s_i / Σ v_i^b), where s_i = (μ_i^s − μ_i^b)^2 and v_i^b is background variance. (2) Split-frames correlation-based (Frank & Al-Ali): estimate SNR = p/(1−p), p = cross-correlation between independent measurements; report in dB as 10 log10(p/(1−p)). For denoised micrographs, correlation between denoised odd and raw even halves was computed. Tomograms used the second method on ~1000×1000×150 sub-volumes.

Short-exposure experiments: To simulate reduced dose, frame stacks from four datasets (EMPIAR-10234, 18sep08d, 19jan04d, 19may10e; full doses 67.12, 39.6, 69.46, 64.44 e−/Ų) were truncated to 10%, 25%, 50%, 75%, 100%, aligned (MotionCor2, dose weighting), denoised with the general U-net, and evaluated via SNR and visual inspection. Additional 19jan04d titrations used 100 micrographs per dose fraction without dose weighting for downstream reconstruction tests (CTFFIND4, CryoSPARC workflows).

Throughput testing: On Titan Krios with Gatan K2 and K3, Leginon sessions compared normal dose (~66 e−/Ų) versus optimized low dose (~17 e−/Ų), with 1 or 4 exposures per hole, measuring exposures/hour to quantify throughput.

3D cryoET denoising: Noise2Noise was extended to 3D with 3D convolutions and a 7-voxel initial kernel. Thirty-two aligned K2 BioQuantum tilt-series (average pixel size 5.2 Å; defocus ~9 μm) were split into even/odd tilt-series, reconstructed (Appion-Protomo alignment, Tomo3D WBP), and used to train two general models: Unet-3d-10a (binned by 2; ~10 Å voxels) and Unet-3d-20a (binned by 4; ~20 Å). Training ran for over a month (10a) and 10 days (20a) across seven GTX 1080 GPUs. Inference on a 3 GB binned-by-two tomogram takes ~5 minutes using two RTX GPUs in parallel.

Key Findings
  • General 2D denoising performance: The best overall model, a full U-net trained on the Large dataset, yielded the highest SNR across diverse conditions. It outperformed conventional low-pass filtering on all but one dataset (where performance was equivalent) and improved SNR by more than 2 dB on average over low-pass filtering and by roughly 20 dB (~100×) over raw micrographs.
  • Architecture and training comparisons: Minor differences between L1 and L2 losses (L1 slightly favored overall). Dataset size/quality mattered: smaller architectures sometimes performed better on the Small dataset, but the full U-net trained on the Large dataset was best overall.
  • Generalization: The pre-trained U-net generalized across DDDs (K2, Falcon II, Falcon III), modes (super-resolution and counting), and even to non-DDD screening cameras with larger pixel sizes and hardware binning, visibly improving protein contrast while smoothing background; minor artifacts were noted in some cases (e.g., glutamate dehydrogenase) but interpretability improved.
  • Particle picking and reconstruction (clustered protocadherin, EMPIAR-10234): Denoising revealed low-SNR top-views, enabling more complete picking. Training Topaz on 1,023 manual picks from denoised micrographs led to 23,695 particles after classification (vs. 10,010 from 1,540 picks on raw), a 2.15× increase in particles used for 3D reconstructions. Two conformations were resolved: closed (13,392 particles) matching prior subtomogram alignment (EMD-9197) and a putative partially open state (8,134 particles) with a ~15 Å twist change. Single particle resolution improved to ~12 Å versus prior ~35 Å cryoET; ResLog suggests millions of particles needed for near-atomic resolution. However, ab initio reconstruction from denoised particles was less reliable than from raw (only 1/6 correct vs. 4–5/6 correct with raw), cautioning against using denoised particles for reconstruction.
  • Low-dose imaging: Denoised micrographs from 10–25% of the exposure (≈4.0–16.7 e−/Ų) achieved SNR and visual interpretability comparable to full raw exposures. Reconstructions from low-dose stacks (≈16.7 e−/Ų) supported accurate CTF estimation and in some cases surpassed full-dose resolutions. At ~20 e−/Ų, U-net denoising achieved ≥1.5× SNR improvement over low-pass filtering. Low-pass filtering required roughly double the dose to match the neural denoiser’s SNR.
  • Throughput gains: On K2, optimized low dose increased exposures/hour by 65% (stage shift, 1 exposure/hole; 178 vs. 108) and 57% (image shift, 4 exposures/hole; 190 vs. 121). On K3, gains were 25% (242 vs. 195) and 15% (273 vs. 237), respectively, equating to on the order of 1,000 more exposures per day.
  • 3D cryoET denoising: The Unet-3d-10a model improved tomogram SNR by >3 dB over raw and by ~1 dB over the best low-pass. Self-trained models offered only marginal additional SNR benefit. Visual comparisons showed marked contrast/detail improvements for cellular structures (ribosomes, membranes, mitochondrial proteins) and single-particle tomograms (80S ribosomes at 2.17 Å pixel size, 4 μm defocus), while appropriately flattening background. Applying a Gaussian filter post-denoising further boosted contrast at the expense of high-frequency detail.
  • Practicality and integration: Denoising is fast (seconds per micrograph; minutes per tomogram), integrated into Topaz, and available across common cryoEM software (CryoSPARC, Relion, Appion, Scipion).
Discussion

The findings demonstrate that a Noise2Noise-trained U-net can learn an effective, general denoising transformation directly from real cryoEM data, improving SNR and interpretability beyond conventional filters while preserving structural features and smoothing background. This directly addresses the low-SNR bottleneck in particle visualization and picking, enabling more complete orientation coverage and larger, cleaner particle sets, as evidenced by the clustered protocadherin reconstructions and the discovery of a putative partially open conformation. By enhancing SNR at substantially reduced doses, Topaz-Denoise can shorten exposure times without compromising downstream processing, leading to significant gains in data collection throughput. The approach scales to 3D, where general tomogram denoisers improve contrast and SNR, aiding interpretation and potentially facilitating segmentation and analysis in crowded cellular contexts. Conceptually, the Noise2Noise objective aligns with maximizing cross-correlation between independent measurements, linking L2 loss minimization to SNR increases. Practically, models generalize across detectors and conditions, especially those represented in training, and enable real-time denoising during acquisition. However, denoised particles should not be used directly for reconstruction due to potential hallucinations and mismatched noise assumptions in downstream software; denoising should assist visualization and object identification, after which raw data should be used for refinement.

Conclusion

Topaz-Denoise delivers pre-trained, general-purpose 2D and 3D denoising models for cryoEM and cryoET that substantially enhance SNR and visual interpretability across diverse datasets and instruments. The models enable more complete particle picking, improved reconstructions for challenging targets (e.g., clustered protocadherin with an additional putative conformation), and practical low-dose acquisition that increases microscope throughput. The 3D denoiser enhances tomogram contrast and detail with performance comparable to self-trained models. The methods are fast, modular, and integrated into widely used cryoEM pipelines. Future directions include leveraging 3D denoising during iterative alignment (e.g., via half-map training for local B-factor-like effects), broader training to encompass more detectors and conditions, improving robustness against hallucinations, and integrating denoising with automated segmentation and particle picking for end-to-end workflows.

Limitations
  • Potential hallucination: Neural denoisers may introduce subtle, undetectable artifacts or imprints from training data. Denoised particles should not be used for reconstruction; instead, use denoised images for visualization and picking, then process raw data.
  • Reconstruction software assumptions: Current refinement tools assume raw-data noise statistics; denoised inputs can degrade ab initio outcomes, as observed with fewer correct reconstructions from denoised particles.
  • Generalization scope: Best visual results are achieved on detectors and conditions represented in training (K2, Falcon II/III). While non-DDD cameras showed good results, occasional artifacts occurred and performance may vary.
  • SNR estimation limits: Without ground truth, SNR metrics are estimates, relying on region labeling or split-frame correlations that require paired frames and may not capture full-micrograph statistics.
  • Dose thresholds: Although low-dose denoising enables effective visualization, accurate CTF estimation required ~16.7 e−/Ų in tested cases, implying practical lower bounds for certain downstream steps.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny