
Biology
Cell morphology-based machine learning models for human cell state classification
Y. Li, C. M. Nowak, et al.
This groundbreaking research by Yi Li, Chance M. Nowak, Uyen Pham, Khai Nguyen, and Leonidas Bleris introduces an automated and stain-free method that leverages machine learning to differentiate between healthy and apoptotic cells using flow cytometry data. The multilayer perceptron model demonstrated exceptional performance in classifying live cells, marking a significant advancement over traditional flow cytometry techniques.
~3 min • Beginner • English
Introduction
Fluorescence-based flow cytometry enables quantitative single-cell analysis and commonly employs forward scatter (FSC) and side scatter (SSC) to gate cell populations based on size and granularity. While Annexin V/PI staining provides accurate identification of apoptotic states, it is laborious, costly, and can interfere with other readouts. Manual FSC/SSC gating is subjective and can introduce bias. The research question is whether six morphology-derived flow cytometry features (FSC-A, FSC-H, FSC-W, SSC-A, SSC-H, SSC-W) are sufficient, when combined with machine learning, to accurately classify live versus apoptotic cells without staining. The authors hypothesize that each FSC/SSC measure contributes unique state-relevant information and that their combination can robustly indicate apoptosis, enabling automated, stain-free classification that improves upon manual gating.
Literature Review
Machine learning has been applied widely in biomedicine, including cancer prognosis, drug discovery, and analysis of biological networks. Prior cell state classification approaches often use microscopy data: time-lapse quantitative phase imaging with LSTM achieved 76% accuracy for cell death detection; SVMs trained on fluorescence microscopy features distinguished normal versus apoptotic CHO cells. Advanced cytometry/microscopy platforms (e.g., image-activated cell sorting, time-stretch quantitative phase imaging, stimulated Raman scattering cytometry) achieve high-throughput, high-accuracy phenotyping and sorting, often for cell type discrimination. However, these methods can be complex and not broadly accessible compared to standard flow cytometry. The literature indicates potential for morphology-based ML classification but highlights limitations of staining costs and manual gating subjectivity and the gap in using only FSC/SSC features for state classification.
Methodology
Data generation and labeling: HCT116 colorectal cancer cells were reverse-transfected with miR-34a-5p mimic (25 nM) to induce apoptosis. Cells were stained with Annexin V–Alexa Fluor 488 and propidium iodide (PI) and analyzed by flow cytometry (BD LSR Fortessa). Cells with negative Alexa Fluor 488 and PI fluorescence values were excluded. Live/apoptotic binary labeling used Annexin V (Alexa Fluor 488) cutoff: negative = live (label 0), positive = apoptotic (label 1). Multiclass labeling used Annexin V/PI quadrants: live (Annexin V−/PI−, label 0), early apoptotic (Annexin V+/PI−, label 1), late apoptotic/necrotic (Annexin V+/PI+, label 2).
Dataset composition (binary): From 9990 cells, 5722 live and 4268 apoptotic were obtained. To balance classes, 4268 live were randomly sampled to match 4268 apoptotic, forming an initial balanced set (8536 cells). This was split 80/20 into training (6828; 3411 live, 3417 apoptotic) and testing (1708; 857 live, 857 apoptotic).
Features: Six non-fluorescent flow cytometry scatter features were used: FSC-A, FSC-H, FSC-W, SSC-A, SSC-H, SSC-W (size and granularity proxies). These informed the models exclusively, without staining signals.
Preprocessing and visualization: Standardization (StandardScaler: mean 0, SD 1) was applied to training and testing sets. Distributions were inspected with box plots. PCA and t-SNE (two components) visualized standardized training data, showing partial separation with overlap.
Modeling (binary): Five algorithms were screened: logistic regression, random forest (n_estimators 1–100), k-nearest neighbors (k=1–100), multilayer perceptron (MLP; 2 hidden layers, nodes per layer 1–30; solver adam; alpha 0.001; max_iter 1000; random_state 1), and SVM (linear, sigmoid, Gaussian kernels with grid over C and gamma). Tenfold cross-validation on the training set applied filters: mean accuracy > 0.90 and accuracy standard deviation < 0.10. Candidates passing CV were then evaluated on the standardized test set with additional filters focusing on live-cell prediction: live precision > 0.91 and live recall > 0.91. Ensemble models (hard voting, soft voting) combined candidate MLPs.
Modeling (multiclass): Starting dataset after exclusions: 9990 cells with 5722 live, 699 early apoptotic, 3569 late apoptotic. Train/test split 80/20: training 7992 (4607 live, 552 early, 2833 late), testing 1998 (1115 live, 147 early, 736 late). Standardization was applied. To address class imbalance in training, SMOTE oversampled minority classes to balance (4607 per class). Random forest (1–100 trees) and MLP (2 layers, 1–30 nodes each) were trained on the balanced training set. Test set evaluation used the same live-focused precision/recall filter. Ensemble (hard/soft voting) combined top models.
Comparators: Conventional manual gating using FSC-A vs SSC-A with three gate sizes (A, B, C) for live-cell selection. Unsupervised baselines (K-means, Gaussian mixture) on standardized and non-standardized data with k=2 clusters.
Software/hardware: Python scikit-learn (classification, CV, scaling, PCA, t-SNE, clustering, voting), numpy, pandas, matplotlib. Hardware: Dell desktop (i7-10700, 32 GB RAM, Win10) and Dell laptop (i5-5300U, 9 GB RAM, Win7).
Metrics: Threshold-dependent (precision, recall, accuracy) and threshold-independent (ROC AUC, average precision). Live-focused precision and recall emphasized as primary selection criteria.
Key Findings
- Using only six FSC/SSC-derived features and standardized data, supervised ML achieved high performance for live vs apoptotic classification. Out of 1046 candidate models (93 RF, 79 k-NN, 862 MLP, 12 SVM), only MLP models met stringent test-set filters (live precision >0.91 and live recall >0.91). Three top MLPs were selected:
• MLP 13-19: live precision 0.913, live recall 0.931, live F1 0.922, accuracy 0.921, ROC AUC 0.970, average precision 0.972.
• MLP 13-21: live precision 0.917, live recall 0.928, live F1 0.922, accuracy 0.922, ROC AUC 0.970, average precision 0.972.
• MLP 16-6: live precision 0.912, live recall 0.933, live F1 0.923, accuracy 0.922, ROC AUC 0.970, average precision 0.973.
- Ensemble of the three (hard voting): live precision 0.917, live recall 0.932, accuracy 0.924 (ROC AUC N/A), offering only marginal changes relative to single best model (MLP 16-6).
- Standardization was essential: models trained on non-standardized data failed final precision/recall filters despite some passing CV.
- Manual FSC-A/SSC-A gating underperformed vs MLP 16-6: Gate A precision 85.8%, recall 73.6%; Gate B precision 89.1%, recall 54.5%; Gate C precision 93.9%, recall 25.5%, reflecting large loss of true live cells.
- Unsupervised clustering performed poorly. Non-standardized K-means: precision 52.7%, recall 41.3%, accuracy 52.2%; non-standardized Gaussian mixture: precision 57.4%, recall 65.1%, accuracy 58.5%. Standardized K-means: precision 50.3%, recall 62.8%, accuracy 50.4%; standardized Gaussian mixture: precision 57.5%, recall 65.1%, accuracy 58.5%.
- Multiclass (live, early, late apoptotic) with SMOTE-balanced training achieved strong performance for live and late apoptotic, but poor for early apoptotic:
• MLP 7-2 (accuracy 88.5%): live precision 93.2%, recall 91.1%; late precision 91.5%, recall 89.7%; early precision 49.7%, recall 63.3%.
• RF 76 (accuracy 88.3%): live precision 93.0%, recall 91.4%; late precision 91.4%, recall 89.1%; early precision 48.4%, recall 60.5%.
• Ensembles slightly increased early precision (hard: 50.9%, soft: 51.7%) but decreased early recall (hard: 59.2%, soft: 61.2%), with overall accuracy ~88.8–88.9%.
Discussion
The study demonstrates that scatter-derived morphology features (FSC/SSC A-H-W) contain sufficient information to reliably classify live versus apoptotic cells without staining when modeled with supervised learning, particularly MLPs on standardized inputs. This addresses the need for stain-free, automated gating, reducing cost and potential assay interference compared to Annexin V/PI, and mitigating human bias inherent in manual FSC/SSC gating. Standardization proved critical due to differing feature scales, and supervised approaches far outperformed unsupervised clustering, reinforcing the value of using labels when available. Ensemble methods yielded minimal gains, likely because individual MLPs already achieved near-optimal ROC AUC (~0.97). In a multiclass setting, models effectively separated live and late apoptotic cells but struggled with early apoptotic cells, consistent with the transitional, morphologically ambiguous nature of early apoptosis and limited training examples. The model compared favorably to image-based classifiers reported in the literature and outperformed conventional gating in both precision and recall, suggesting immediate utility as a module in flow cytometry pipelines for improved live-cell selection.
Conclusion
The authors developed and validated a stain-free, flow cytometry-based MLP classifier using six FSC/SSC features that accurately distinguishes live from apoptotic cells, outperforming conventional manual gating and rivaling more complex imaging-based methods. The approach is simple, accessible, and ready to integrate into standard cytometry workflows. Future work should: (1) validate across diverse cell types, treatments, and instruments to assess generalizability; (2) improve early apoptotic detection, potentially via additional biophysical features or larger, balanced datasets; (3) explore domain adaptation and calibration across datasets; (4) integrate the model into cytometer software for real-time gating; and (5) evaluate performance in downstream applications (e.g., proliferation, drug response assays).
Limitations
- Generalizability was assessed on a single cell line (HCT116) and apoptosis induction method (miR-34a), which may limit applicability across cell types and conditions.
- Model performance depended on feature standardization; non-standardized data led to failures in test precision/recall.
- Early apoptotic classification in the multiclass task was poor, likely due to class scarcity and the transitional morphology of early apoptosis.
- Manual gating comparison used FSC-A/SSC-A with three gates and may vary with operator and instrument settings; nonetheless, results highlight inherent overlap challenges.
- Only six scatter features were used; additional cytometry parameters or biophysical features might further improve accuracy, especially for early apoptosis.
Related Publications
Explore these studies to deepen your understanding of the subject.