
Biology
Streamlined structure determination by cryo-electron tomography and subtomogram averaging using TomoBEAR
N. Balyschew, A. Yushkevich, et al.
Discover how TomoBEAR is revolutionizing cryo-electron tomography (cryo-ET) data processing for subtomogram averaging (StA). This innovative workflow engine, developed by Nikita Balyschew, Artsemi Yushkevich, Vasilii Mikirtumov, Ricardo M. Sanchez, Thiemo Sprink, and Mikhail Kudryashev, streamlines the process and enhances the capability for high-resolution structural biology research.
~3 min • Beginner • English
Introduction
Cryo-electron tomography (cryo-ET) enables visualization of macromolecules in the native cellular context. When combined with subtomogram averaging (StA), cryo-ET can achieve near-angstrom to a few-angstrom resolutions for diverse complexes, revealing functional insights in situ. Despite advances in hardware and software that improved resolution and broadened applicability, several hurdles impede mainstream adoption of StA: workflows span multiple specialized packages that are difficult to interface; tilt-series alignment and particle identification often require manual intervention; and storage and compute needs are substantial due to large 3D volumes and intermediate data. Moreover, many macromolecules are present at low copy numbers, requiring acquisition and processing of large numbers of tomograms. To address these challenges, the authors introduce TomoBEAR, an open-source, modular, configurable workflow engine designed for mostly automated, scalable processing of cryo-ET data for StA. TomoBEAR integrates commonly used cryo-EM tools with sensible defaults, supports live data processing, and provides a transparent, "white-box" approach to data management and processing. The study demonstrates the pipeline on purified targets, a membrane protein (RyR1), and plasma FIB-milled lamellae, aiming to minimize manual steps while achieving high-resolution structures.
Literature Review
The paper situates TomoBEAR within ongoing developments in cryo-ET and StA. Prior advances in instrumentation and algorithms have enabled higher-resolution reconstructions and correction of non-linear sample motions and electron optical distortions. Several established workflows and software ecosystems exist, including IMOD with BatchRunTomo, emClarity, tomoAuto, Dynamo, EMAN2, M combined with IMOD and RELION, and ScipionTomo. These systems address aspects of alignment, CTF correction, reconstruction, particle picking, and averaging, often requiring substantial user expertise and manual orchestration. TomoBEAR is designed to reduce user intervention, streamline high-resolution StA, and maintain flexibility while limiting external dependencies to ease maintenance. The discussion further references neural network-based particle pickers and automated tomogram annotation tools that can be incorporated, as well as pipelines like Warp–RELION–M and CryoSPARC for hybrid processing or downstream refinement.
Methodology
TomoBEAR is a MATLAB-based, modular pipeline runner that executes one module per tilt stack, tomogram, or particle set, with near-automated execution up to tomographic reconstruction and configurable user interventions. Configuration is provided via JSON files with global and tool-specific parameters, supported by defaults.json. Execution creates module-specific folders with SUCCESS flags to enable selective re-runs.
Preprocessing and reconstruction: Starting from data collection movie frames, TomoBEAR performs motion correction (MotionCor2), tilt-series assembly, tilt-series alignment (gold fiducial-based using IMOD BatchRunTomo or Dynamo-TSA; fiducial-less using IMOD patch tracking or AreTomo), defocus estimation (GCTF or CTFFIND4), 2D CTF correction (IMOD Ctfphaseflip), and tomogram reconstruction (IMOD). Dynamo-TSA is default when gold fiducials are available and includes automatic success assessment. A parallel IMOD project is maintained for inspection/refinement. Post-processing options include denoising (nonlinear anisotropic diffusion) and CTF deconvolution/denoising/missing wedge filling via IsoNet.
Particle picking: TomoBEAR provides multiple strategies: GPU-accelerated Dynamo template matching (reimplemented for 12–15× speedup) with automatic template preparation from EMDB entries and CC-map post-processing (e.g., removing large islands or edges); deep learning with crYOLO (training and prediction integrated); and manual/semi-automated geometry-supported picking via the Dynamo Catalogue system. Picking is typically performed on highly binned tomograms, optionally filtered for visualization.
Particle extraction and CTF: Initial subtomograms are either extracted from tomograms or reconstructed directly from tilt substacks using SUSAN on GPUs, performing 2D per-particle CTF correction that accounts for particle height. Direct reconstruction avoids building large unbinned tomograms and saves storage.
Subtomogram alignment and classification: TomoBEAR can automatically generate and execute Dynamo multi-reference alignment (MRA) projects using templates and noise traps to separate true/false positives. Users can schedule multiple classification steps with progressive binning reduction; particles are re-centered at each extraction and regenerated at lower binning either by cropping or direct reconstruction from tilt series. Final particle sets can be exported to RELION (STAR files) and SUSAN for high-resolution refinement. Independent half-set processing and FSC-based resolution estimation are supported. For hybrid StA workflows, exported STAR files are compatible with RELION and other packages like CryoSPARC.
Utilities and live mode: TomoBEAR includes control points for manual interventions (StopPipeline), automatic handling of multiple gold bead sizes for robust alignment, gold and edge erasure to reduce false-positive picks, grid-edge eraser, and cleanup of temporary files. A live processing mode quickly reconstructs binned tomograms (skipping motion and CTF correction) for on-the-fly data quality assessment during acquisition, configured with expected images per tilt-series and a listening time threshold.
Benchmark/data processing details: The workflow was applied to four datasets. For EMPIAR-10064 (80S ribosome), automated processing included Dynamo fiducial alignment with IMOD inspection, GCTF-based defocus and ctfphaseflip correction, gold erasure, bin-8 tomogram reconstruction, template matching with an EMD-3420-derived template, MRA in Dynamo, and SUSAN-based direct reconstruction for final refinement. For human apoferritin (EMPIAR-11543), fully automated preprocessing with MotionCor2 (7×5 patches), Dynamo fiducial alignment refined in Etomo, per-tilt GCTF and ctfphaseflip, gold removal, bin-8 reconstruction, and Dynamo template matching were followed by SUSAN initialization and RELION4 classification/refinement with polishing. For RyR1 (EMPIAR-10452), automated preprocessing with MotionCor2 (patch-based for zero-tilt high-dose and global for tilted), IMOD patch tracking, per-tilt GCTF and ctfphaseflip, bin-8 reconstruction, template matching, and RELION4 classification/refinement with polishing were used. For plasma FIB-milled HeLa cell lamellae (EMPIAR-11306), automated preprocessing and reconstruction were performed with IMOD patch tracking or AreTomo local alignment, followed by ribosome picking via template matching and crYOLO and RELION4 classification/refinement.
Key Findings
- TomoBEAR provides a mostly automated, modular, and configurable pipeline for cryo-ET StA, integrating MotionCor2, IMOD, Dynamo, AreTomo, GCTF/CTFFIND4, SUSAN, and exports to RELION and CryoSPARC.
- Live processing mode achieves approximately threefold speedup to visualize tomograms during data collection compared to conventional offline processing, enabling rapid data quality feedback.
- Benchmark results across four datasets demonstrate high-resolution outcomes with minimal manual intervention:
• EMPIAR-10064 (80S ribosome, purified): 11.0 Å global resolution (FSC 0.143), comparable to the original 11.2 Å. Preprocessing ~1 h (from assembled stacks) and ~9 h for template matching and StA.
• EMPIAR-11543 (human apoferritin, purified): 2.8 Å global resolution at counted pixel size (Nyquist 2.7 Å), after RELION4 refinement and polishing of ~20.8k particles; fully automated TomoBEAR preprocessing on 60 tomograms completed in ~1–1.5 days; final map obtained within two weeks including cluster queueing.
• EMPIAR-10452 (RyR1 in native membranes): 8.9 Å resolution after RELION4 refinement/polishing, slightly surpassing the prior 9.1 Å; tilted images contributed to improved resolution; total processing time reduced from months (original manual workflow) to about one week with TomoBEAR-assisted pipeline.
• EMPIAR-11306 (in situ 80S ribosome from plasma FIB-milled lamellae): best map at 6.2 Å overall resolution obtained from template matching on IMOD patch-tracked tomograms; local resolution up to 4.6 Å; lower resolution versus original (4.9 Å) attributed to thicker tomograms, more false positives, and a smaller final particle set (~7.6k vs ~17k).
- GPU-accelerated template matching in TomoBEAR yields 12–15× speedup; SUSAN-based direct reconstruction reduces storage needs and speeds extraction by avoiding large unbinned tomograms.
- The workflow reduces manual steps to limited interventions (e.g., occasional gold fiducial refinement and class selection) while preserving a white-box, auditable process.
Discussion
The study demonstrates that TomoBEAR addresses key barriers in subtomogram averaging by automating multi-package workflows, enabling scalable processing, and providing live feedback during data acquisition. Across diverse datasets—purified symmetric proteins, membrane proteins in native membranes, and in situ lamellae—TomoBEAR achieved resolutions comparable to or better than original reports, while significantly reducing processing time and manual effort. The integration of robust tilt-series alignment (including Dynamo-TSA and IMOD/AreTomo), per-tilt CTF estimation/correction, GPU-accelerated template matching, and export to high-resolution refinement tools (RELION, SUSAN) supports end-to-end structure determination. The findings emphasize that while the final refinement stages benefit from expert tuning (masks, filters, class selection), the automated pipeline reliably delivers high-quality particle stacks and reconstructions suitable for downstream optimization. The in situ lamellae case highlights common challenges—thickness-related degradation, false positives, and particle scarcity—and underscores the value of flexible picking strategies (template matching, crYOLO) within a streamlined framework. Overall, TomoBEAR lowers entry barriers, improves reproducibility, and helps shift the bottleneck from data collection to efficient processing in in situ structural biology.
Conclusion
TomoBEAR is an open-source, modular, and largely automated workflow that streamlines cryo-ET subtomogram averaging from raw movies to high-resolution reconstructions, integrating widely used tools with sensible defaults and transparent data management. Benchmarking shows that TomoBEAR delivers state-of-the-art structures (down to 2.8 Å for apoferritin) with minimal manual intervention and substantial time savings relative to traditional manual pipelines. The software’s white-box design, live processing capability, and exports to RELION/SUSAN/CryoSPARC make it broadly applicable across sample types, including challenging in situ lamellae. Future directions include deeper integration with Warp–RELION–M, incorporation of pre-trained neural network pickers and unsupervised tomogram annotation, expanded automation of final refinement steps, and potentially a web-based interface, further increasing throughput and accessibility.
Limitations
- Certain steps still benefit from manual intervention, notably occasional refinement of fiducial alignment and class selection during subtomogram classification.
- Automation at the final refinement stages has limited utility due to the need for expert tuning of masks, particle sets, and filters.
- Particle picking remains a bottleneck in challenging samples; template matching can yield false positives (e.g., gold beads, grid edges, reconstruction artifacts), especially in thick tomograms, necessitating careful post-processing and validation.
- Achieved resolution in in situ lamellae was limited by thickness, false positives, and reduced final particle counts compared to original studies.
- Running super-resolution StA and extensive RELION refinements can require substantial computational resources and queue times; live mode accelerates visualization but omits motion and CTF correction.
- Users must install and manage external dependencies (MotionCor2, IMOD, GCTF/CTFFIND4, Dynamo, AreTomo, SUSAN, RELION) compatible with their compute environments.
Related Publications
Explore these studies to deepen your understanding of the subject.