
Biology
COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning
M. Salek, N. Li, et al.
Discover COSMOS, the groundbreaking platform developed by an expert team from Deepcell Inc and Stanford University, revolutionizing the way we characterize and sort single cells through real-time deep learning analysis of high-resolution images. This innovative technology enables the efficient purification of viable cells using morphology analysis without the need for labels or stains.
~3 min • Beginner • English
Introduction
The study addresses the gap between rapid advances in single-cell molecular profiling and limited tools for high-dimensional morphology-based cell assessment and sorting. Morphology has long been a diagnostic cornerstone and correlates with genomic and functional states, yet existing image-based sorting systems often rely on biomarker staining, deformability assays, or engineered features from low-resolution or reconstructed images, which can limit viability and information content. Integrating deep learning with real-time, high-resolution imaging for label-free sorting has remained challenging due to data, computational, and hardware constraints. The authors introduce COSMOS, a cloud-enabled platform that captures high-resolution brightfield images of cells in flow and uses deep CNNs to generate morphological embeddings for real-time classification and sorting. They hypothesize that deep morphological descriptors can distinguish cell types, states, and lineages, enabling label-free enrichment of viable cells and linking morphology to molecular readouts.
Literature Review
Prior image-based sorting approaches have demonstrated potential but are limited by dependence on fluorescent biomarkers or deformability assays that can affect cell viability and bias discovery by excluding marker-negative cells. Some methods depend on reconstructed images or hand-engineered low-dimensional features, constraining the breadth of morphology captured and requiring manual gating strategies tuned to specific applications. Deep learning has shown strong performance in pathology image analysis and in inferring molecular signals from label-free images, and hybrid systems combining shallow CNNs with sorting devices have been reported. However, real-time deep learning inference on high-resolution cell images to drive sorting decisions has been technically prohibitive, largely due to data scale and integration challenges between imaging, computation, and microfluidics. COSMOS builds on these insights by using actual high-resolution images, large-scale annotated datasets, and an optimized inference pipeline to enable label-free, morphology-based sorting.
Methodology
Platform overview: COSMOS combines microfluidics, high-speed brightfield imaging, deep learning, and pneumatic valve-based sorting. Cells in suspension are hydrodynamically and inertially focused into a narrow z-plane and lateral trajectory within a microfluidic cartridge (channel height 15–40 µm; ~10 µL/min flow). High-contrast brightfield images (two per cell) at high magnification (40X–100X; 0.044 µm² per pixel) are captured by an ultra-high-speed camera (up to 1.2 million frames/min). A laser-based tracking system and PMT signals assist timing and monitoring. Real-time classification results control pneumatic microvalves for gentle sorting into collection or waste. The instrument automatically aligns, focuses, and monitors run metrics (cell counts, purity, yield, focus, sync, pressures).
Models and training: CNNs based on Inception V3 (48 layers, ~24M parameters) were adapted for grayscale input and to output both class predictions (softmax) and high-dimensional morphological embeddings. The fully connected layer embeddings support visualization (UMAP/tSNE) and clustering; softmax outputs drive sorting decisions and provide image quality metrics. Training data were drawn from the Deep Cell Atlas (DCA), comprising >1.6 billion single-cell images overall; 25.7 million high-resolution images were collected for supervised training across PBMCs, fetal blood cells (fnRBC), multiple cancer cell lines (NSCLC, HCC, pancreatic carcinoma, ALL, AML), and control classes (out-of-focus, debris, clumps). Images were resized to 299×299 pixels.
Annotation pipeline: An AI-assisted annotation workflow used unsupervised and semi-supervised methods. Embeddings from ImageNet-pretrained and internal CNNs were clustered (agglomerative clustering) to present morphologically similar groups for rapid expert labeling. Subject metadata constrained permissible labels. Iterative hard-negative mining and active learning improved class balance and accuracy. Trained experts could annotate up to 6000 cells/min, with multi-round QC to control mismatch rates.
Data augmentation and robustness: To enhance generalization across instruments and imaging conditions, augmentations included flips, rotations, Gaussian noise, intensity scaling, salt-and-pepper noise, and custom augmentations simulating camera/framebuffer artifacts (e.g., vertical stripes), chip variability, and sample-correlated imaging artifacts. Images were captured across four instruments and varying focus levels.
Real-time computation: Workloads were distributed across CPUs (Intel Xeon E-2146G and Xeon 4108), an Nvidia Quadro P2000 GPU (TensorFlow 1.15 with TensorRT 7.0 optimization), and a DSP-based microcontroller for valve control. Images were cropped on CPU and batched to GPU for inference; the microcontroller synchronized valve actuation with cell arrival. Parallel pipelines allowed multiple cells to be processed at different stages concurrently. Training pipelines ran on Google Cloud with Apache Beam/Dataflow, Airflow orchestration, TPU Pods, and storage in PostgreSQL/BigQuery/Cloud Storage.
Classifiers: Two supervised models were developed: (1) Circulating Cell Classifier (PBMCs, fnRBCs, NSCLC, HCC; with debris/out-of-focus filtering), and (2) Lung Tumor Classifier (NSCLC vs stromal vs WBC) for dissociated tumor cells (DTCs). Validation used distinct samples with ≥30% images held out for validation and in silico mixtures to assess generalization and enrichment performance.
Sorting parameters: Sorting used paired pneumatic microvalves with tunable valve windows to trade off purity and yield based on cell rate. For example, at 3000 cells/min and ~15 ms valve window, ~60% purity at ~80% yield was achievable.
Biological samples and workflows: PBMCs, T cells (activation/differentiation), cancer cell lines (A549, H23, H522, HEPG2, HEP3B2, SNU182), and NSCLC DTCs were prepared per standard protocols. For spike-ins, A549 cells were added to whole blood (e.g., 40 or 400 cells/mL), followed by RBC lysis, fixation, CD45+ depletion, and COSMOS sorting. Downstream validations included SNP panels, targeted mutation assays, WGA/CNV profiling, and scRNA-Seq (targeted immune panel and WTA).
Key Findings
- Morphology-based embeddings separate cell types and lines: UMAPs of embeddings showed distinct clusters for NSCLC vs HCC, and further separation among cell lines (e.g., H23, H522, A549; HEPG2, HEP3B2, SNU182). PBMCs exhibited higher morphological heterogeneity. HEP3B2 and H23 (similar size) separated in embedding space, indicating features beyond size contribute.
- T cell state discrimination: tSNE projections differentiated naive vs activated T cells; a multi-day differentiation time course (Day 0–5) showed embeddings shifting with cell state changes.
- Circulating Cell Classifier accuracy (validation): fNRBC 87%, HCC 100%, NSCLC 92%, PBMC 100%.
- In silico enrichment performance (PBMC background):
• NSCLC AUC: 0.9842 (positive selection), 0.9996 (negative selection).
• HCC AUC: 0.9986 (positive), 0.9999 (negative).
• fNRBC AUC: 0.97 (positive). At 1:100,000, precision >70% and recall ~50% for HCC and fNRBC; NSCLC recall ~15% at similar precision.
- Biological spike-in and sorting results:
• A549 and H522 spike-ins into PBMCs at 1:1000–1:100,000 showed strong enrichment by SNP-based purity estimates. At 1:100,000 spike-in: A549 20% purity (~13,904× enrichment), H522 30–33% purity (~30,000–32,500×).
• Targeted TP53 frameshift (c.572_572delC) in H522 detectable at 23% allele fraction at 1:100,000 spike-in.
• Whole blood spike-ins with pre-depletion: For A549 at 400 cells/mL, final purities 55% (>10,900×) and 80% (>29,000×); at 40 cells/mL, 43% (>33,500×) and 35% (>27,800×).
- Cell viability and transcriptomic integrity: COSMOS processing preserved viability and transcriptomes. scRNA-Seq of PBMCs showed high correlation between unprocessed and COSMOS-processed samples (targeted panel R2=0.97; WTA R2=0.983). Compared to FACS, COSMOS induced fewer differentially expressed genes and less activation of immune and neutrophil degranulation pathways.
- Lung Tumor Classifier performance and DTC application:
• Validation accuracy: NSCLC 82%, stromal 78%, WBC 96%.
• In three NSCLC DTC samples, COSMOS malignant cell fraction estimates matched scRNA-Seq-derived malignancy levels (e.g., low 2.2% vs 4.6%, medium 12% vs 16.8%, high 40% vs 46.7%).
• Mutation enrichment: KRAS and TP53 allele frequencies increased from <3% to ~20% and from 1–6% to ~80% after sorting, improving sensitivity for low-tumor-content samples.
• CNV sensitivity improved post-sorting; chr8q amplification detectable.
• scRNA-Seq identities preserved: EpCAM+/CD45− fraction increased from 6.71% pre-sort to 94.16% post-sort; gene expression correlation within EpCAM+/CD45− cluster R2=0.98; stress/apoptosis gene sets unchanged.
Discussion
The findings demonstrate that high-resolution, label-free brightfield imaging coupled with deep learning embeddings can accurately distinguish cell types, states, and lineages and enable real-time sorting of viable cells. COSMOS overcomes prior limitations of fluorescence dependence and low-information reconstructed images, providing a generalizable, scalable approach that captures rich morphological traits. The platform’s ability to enrich rare cells (down to 1:100,000) and preserve transcriptomic integrity supports downstream molecular assays, including scRNA-Seq, targeted mutation detection, and CNV profiling, thus linking morphology directly to molecular phenotypes. The cloud-enabled image database allows retrospective reanalysis and discovery of additional phenotypes. Applications include purification of tumor cells from dissociated biopsies, enrichment of rare circulating or fetal cells, and selection of morphologically defined subpopulations for culture, functional assays, and drug testing, facilitating integration of morphology with multi-omics analyses.
Conclusion
COSMOS provides a practical and scalable platform for label-free, real-time morphology-based single-cell classification and sorting using deep learning on high-resolution images. It enables visualization of deep morphological structure across diverse samples, accurate enrichment of target cell populations (including rare cells), and preservation of viability and molecular profiles for downstream analyses. Future work aims to increase throughput, enhance interpretability (explainable AI linking embeddings to conventional morphological features), and leverage advances in optics to classify based on subcellular structures. Integration with other single-cell modalities could yield comprehensive multi-omic and morphologic atlases with translational and clinical impact.
Limitations
- Requirement for cells in suspension necessitates tissue dissociation for solid samples, potentially altering morphology; however, morphology post-dissociation still provided useful fingerprints for sorting.
- Throughput is lower than conventional sorters due to gentle pneumatic valve actuation and computational demands of high-resolution imaging and deep inference; current modeled maximum ~6000 cells/min with purity–yield tradeoffs. Sorting very rare live cells at extreme ratios may be time-limited.
- Explainability of deep morphological features is limited; translating embeddings into interpretable metrics (size, shape, texture) remains an open goal.
- Demonstrations of the most extreme enrichment (1:100,000) used fixed cells to allow longer runs; achieving similar performance for live cells at scale may be constrained by current throughput.
Related Publications
Explore these studies to deepen your understanding of the subject.