Interdisciplinary Studies

Data-driven fine-grained region discovery in the mouse brain with transformers

A. J. Lee, A. Dubuc, et al.

Discover a scalable self-supervised workflow for detecting fine-grained tissue domains from multimillion-cell spatial transcriptomics data using CellTransformer, an encoder-decoder model that learns hierarchical tissue features and enables GPU-accelerated clustering on MERFISH and Slide-seqV2 whole-brain datasets. The research was conducted by Alex J. Lee, Alma Dubuc, Michael Kunst, Shenqin Yao, Nicholas Lusk, Lydia Ng, Hongkui Zeng, Bosiljka Tasic, and Reza Abbasi-Asl.... show more

Introduction

The study addresses the challenge of transforming large, high-dimensional spatial transcriptomics datasets into useful anatomical representations, particularly in the mouse brain. Hierarchical spatial organization is common in tissues, but existing computational methods often do not scale to multimillion-cell, multi-section datasets or require extensive prior neuroanatomical knowledge. The authors propose CellTransformer, a self-supervised transformer-based approach that learns latent neighborhood representations conditioned on cell type to discover spatial domains, aiming to produce data-driven, fine-grained brain parcellations comparable to and extending the Allen Mouse Brain Common Coordinate Framework (CCF). The work’s purpose is to enable organ-wide, scalable and interpretable domain discovery across multiple animals and modalities while minimizing dependence on manual annotations.

Literature Review

Existing spatial domain detection methods face scalability and resolution constraints. Approaches operating on whole tissue-section matrices often exceed GPU/RAM capacity due to large pairwise distance matrices, and Gaussian process-based methods scale cubically with observations. Some scalable methods (e.g., CellCharter, SPIRAL) either struggle to capture granular structure or require extensive batch correction and supervision. Other notable pipelines (e.g., scENVI, STACI, spaGCN, STAligner, STAGATE, GraphST) become infeasible at million-cell scales due to memory requirements (e.g., >60 TB for ~4M cells to store pairwise distances). Prior research has shown gene-expression-based clustering can parcellate brain regions, but integration across sections and discovery of fine-grained, spatially coherent domains remain challenging. The Allen Brain Cell Whole Mouse Brain (ABC-WMB) Atlas provides a multi-million cell MERFISH and scRNA-seq resource enabling evaluation of domain-detection tools against expert-derived CCF annotations at multiple levels (division, structure, substructure).

Methodology

CellTransformer architecture: A graph transformer that learns latent representations of cellular neighborhoods by conditioning gene expression prediction of a masked reference cell on neighborhood context and cell type. Neighborhood definition: cells within a fixed-size box around a reference cell (85 µm for MERFISH datasets; 50 µm for Slide-seqV2). Input tokens per observed neighbor cell are constructed by concatenating embedded gene expression (two-layer MLP with GELU mapping to 192-d) and embedded cell type (learned embedding to 192-d), yielding 384-d per-cell tokens. A register (-like) token is included per neighborhood. Transformer encoder (4 layers, 8 attention heads, GELU, layer norm pre-attention; with biases in K/Q/V) operates with attention restricted within a neighborhood plus the register token. Attention pooling aggregates per-cell and register tokens into a single 384-d neighborhood representation. Decoder receives a learned mask token representing the reference cell’s type and attends only to the neighborhood representation to predict reference-cell MERFISH probe counts via zero-inflated negative binomial regression (outputs: mean, dispersion, scale, zero-inflation logit). Training uses ADAM, 40 epochs, linear warmup to 0.001 for 500 steps, inverse-square root scheduler, weight decay 5e-5; 80/20 train-test split; effective batch size 256 on 2 NVIDIA A6000 GPUs. Spatial domain detection: After training, extract neighborhood embeddings for all cells across sections and cluster using GPU-accelerated k-means (cuml.KMeans; n_init=3, over_sampling_factor=3, max_iter=1000). Optional Gaussian smoothing of embeddings (FWHM 40 µm, sigma 12.01 µm) can be applied pre-clustering to remove high-frequency signals. Datasets: Allen 1 MERFISH (500 genes; 3,737,550 cells; 53 coronal sections at 200 µm), Zhuang 1–4 MERFISH (1129 genes; 2,846,909; 1,227,409; 1,585,844; 162,579 cells; coronal and sagittal; sections at 100–200 µm). Slide-seqV2 whole-brain (Langlieb et al.): deconvoluted single-cell data via RCTD; quality control to 4,783,456 cells and 5019 genes; model scaled to 10 encoder layers. Evaluation metrics: single-cell level spatial homogeneity (proportion of 100 nearest neighbors sharing the same domain), per-domain discrete classification using CCF-derived 20th percentile smoothness thresholds at resolutions matching CCF (k≈25/354/670), NMI and ARI versus CCF annotations, Pearson correlation of domain cell-type composition vectors at subclass level (338 types) and cluster level (5274 types), 1:1 matching via linear sum assignment, clustering stability via Amari-type distance across 20 k-means runs to select resolution (k≈1300), linear probing for donor ID (logistic regression) and coordinate prediction (least squares; median absolute error in µm). Comparator methods: CellCharter (scVI embeddings; aggregate_neighbors at 3/6/9 layers; GMM clustering), SPIRAL (GNN-based; subclass supervision; neighborhood radii tested 40/85/170 µm; trained to 10 epochs; k-means on SPIRAL embeddings when OOM).

Key Findings

• CellTransformer discovers spatially coherent, biologically meaningful domains at multiple resolutions (k=25/354/670 and finer k=1300), reproducing cortical laminar structure and identifying motor cortex Layer 4 consistent with recent evidence. • Spatial homogeneity: At k=670, CellTransformer achieves markedly higher single-cell neighborhood smoothness than CellCharter (+58.2%), SPIRAL (+4091.2%), k-means on cellular neighborhoods (+61.9%), and gene-expression-only k-means (+419.2%); smoothing has minimal effect, indicating embeddings are inherently smooth. • Discrete domain proportion: CellTransformer maintains high proportions of discrete domains across resolutions; comparator methods’ discreteness declines significantly at 354 and 670 domains. • Similarity to CCF: At subclass-level composition, CellTransformer yields superior maximum and matched Pearson correlations versus CCF across mid/fine resolutions. At cluster-level (5274 types), average Pearson correlation between CellTransformer and CCF domains is 0.853, with strong block structures (>0.9) in correlation matrices. NMI vs CCF at k=670 improves over CellCharter by ~13.4% and over SPIRAL by ~46.7%; similar gains without smoothing. • Stability analysis identifies k≈1300 as a practical resolution for fine-grained analyses, supporting hierarchical parcellations; inertia and instability curves’ second derivative crossing indicates k=1300. • Subiculum and prosubiculum mapping (k=1300): CellTransformer recovers dorsal subiculum three-layer strata (molecular, pyramidal, polymorphic) and dorsal/ventral organization, matching Ding et al.; detects dorsal-ventral gene expression gradients (e.g., Bnc2, Six3, Pax5) consistent with literature. • Superior colliculus: Identifies known sensory layers (zonal, superficial gray, optic) and unannotated subregions in intermediate gray/white with layer-specific enriched cell types (e.g., SCsg Gabrr2 GABAergic, SCop Sln glutamatergic; rare types Foxb1, Pitx2 markers), revealing laminar and medial-lateral gradients. • Midbrain reticular nucleus: Discovers four subregions absent in CCF, with dorsal-ventral gradients; domain-level neurotransmitter composition shows correlations for glutamatergic (r=0.89) and non-neuronal (r=0.81) type counts vs proportions, but not for GABAergic (r=-0.64). • Multi-animal integration (Zhuang 1–4; 1129 genes): High cross-animal consistency—at k≈630, 93.3% of domains common to the three largest datasets and 80.0% across all four; average Pearson correlation to CCF regions is 0.805 when fitting across animals, >0.7 when fitting per animal. • Linear probing indicates embeddings encode donor and spatial information: donor classification accuracy >94% for all animals; median absolute coordinate prediction error ~151 µm; performance correlates with per-mouse cell counts. • Slide-seqV2 generalization: With increased model capacity (10 encoder layers) and QC, CellTransformer identifies coherent domains (e.g., cortical layers, midbrain, piriform) at k=50; higher k reduces cross-section integration, likely due to variable density/read depth.

Discussion

CellTransformer provides a scalable, self-supervised transformer-based framework that learns fixed neighborhood representations for organ-wide domain discovery in spatial transcriptomics. It achieves fine-grained, spatially coherent domains aligned with neuroanatomical ontologies (CCF) and recapitulates known structures and gradients in hippocampal formation and superior colliculus, while discovering plausible subregions in understudied areas (MRN). The approach integrates across tissue sections and animals without explicit batch modeling, suggesting robust learned features. Compared to recent scalable methods, CellTransformer maintains spatial coherence at high domain counts and delivers higher correspondence to CCF, indicating improved biological relevance. The transformer design enables future inclusion of additional modalities (cell-level neurophysiology, imaging connectomics, MRI) as tokens, offering a path toward richer anatomical maps and structure-function inference from multi-omics spatial data. The authors emphasize that discovered domains are operational, not normative, and that brain organization may include gradients rather than strictly discrete regions; nonetheless, the method satisfies neuroanatomical conventions for discrete parcellations and provides a practical foundation for data-driven tissue mapping.

Conclusion

The study introduces CellTransformer, a self-supervised transformer architecture that scales to multi-million-cell, multi-animal spatial transcriptomics datasets and discovers fine-grained, spatially coherent tissue domains. It matches and extends CCF-like parcellations, robustly integrates across sections and donors, and generalizes to Slide-seqV2. The method recovers known laminar and regional organization and uncovers putative subregions with interpretable gene and cell-type signatures. Future directions include improved positional encoding (e.g., generalized Laplacian, rotational invariance), handling arbitrary gene sets, probabilistic domain models, and incorporating additional modalities to further enrich neighborhood representations and enable comprehensive structure-function mapping.

Limitations

Discovered domains are not asserted as normative and may reflect discrete parcellations of underlying gradients. The approach requires user-specified neighborhood radius and choice of k for clustering; domain stability varies with these parameters. GPU resources are needed for timely training and clustering, limiting accessibility compared to lightweight methods, though memory demands are lower than many graph-based pipelines requiring full pairwise matrices. Registration inaccuracies to CCF can depress ARI/NMI magnitudes. Optional Gaussian smoothing may erode fine laminar boundaries. Some comparator pipelines could not be run at scale due to out-of-memory constraints, limiting breadth of direct comparisons.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Neuronal responses in the human primary motor cortex coincide with the subjective onset of movement intention in brain-machine interface-mediated actions

J. Noel, M. Bockbrader, et al.

Medicine and Health

Neuronal responses in the human primary motor cortex coincide with the subjective onset of movement intention in brain-machine interface-mediated actions

J. Noel, M. Bockbrader, et al.

Medicine and Health

Accelerating the prediction and discovery of peptide hydrogels with human-in-the-loop

T. Xu, J. Wang, et al.

Earth Sciences

The pattern, change and driven factors of vegetation cover in the Qin Mountains region

C. Huang, Q. Yang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny