logo
ResearchBunny Logo
Introduction
Flow cytometry, a powerful tool in biomedical research, typically uses forward scatter (FSC) and side scatter (SSC) to distinguish cells based on size and granularity. While Annexin V/propidium iodide (PI) staining provides highly accurate cell state classification (live vs. apoptotic), it's laborious and expensive. Manual gating based on FSC/SSC is simpler but introduces human bias. Machine learning offers an alternative, leveraging features from FSC and SSC to predict cell states. Previous studies have used microscopy images for this, but this research proposes using only FSC and SSC data to develop a more accessible and cost-effective method. The six parameters from FSC and SSC (area, height, width for each) are hypothesized to contain sufficient information to distinguish apoptotic and healthy cells, forming the basis of a predictive model.
Literature Review
Several machine learning-based approaches have been used to differentiate cell types and states using microscopy images. For instance, a Bidirectional Long Short-term Memory network achieved 76% accuracy for cell death detection using time-lapse data of cell mass distribution from quantitative phase imaging (QPI). Another study used morphological features from fluorescence microscopy images and a support vector machine (SVM) to distinguish between normal and apoptotic CHO cells. However, these methods rely on microscopy, which may not be feasible in all research settings. This study aims to improve upon these methods by using readily available flow cytometry data and developing a more accessible and robust model.
Methodology
Human colorectal cancer HCT116 cells were reverse transfected with miR-34a-5p mimic and stained with Annexin V-Alexa Fluor 488 and PI. Flow cytometry data, including FSC-A, FSC-H, FSC-W, SSC-A, SSC-H, and SSC-W, was collected. Live and apoptotic cells were defined based on Annexin V staining. Data was split into training (80%) and testing (20%) sets, with data standardization applied. Five classification algorithms (logistic regression, random forest, k-NN, MLP, and SVM) were trained and tested using tenfold cross-validation. Models were selected based on mean accuracy >0.90 and standard deviation of accuracy <0.10. Further filtering was done based on precision and recall for live cell prediction >0.91. The performance of the selected models was evaluated using various metrics such as precision, recall, F-value, accuracy, and area under the ROC curve (AUC). Ensemble methods (hard voting and soft voting) were also explored. Finally, a comparison was made to conventional FSC-A vs. SSC-A gating to highlight the advantages of the proposed method. In addition to the binary classification (live/apoptotic), a three-class classification (live, early apoptotic, late apoptotic) model was built using SMOTE for handling class imbalance.
Key Findings
Out of 1046 candidate models, several MLP models exhibited superior performance on standardized data, achieving over 0.91 live precision and recall, 0.92 live F-value, and 0.97 AUC. The three best-performing MLP models (MLP 13-19, MLP 13-21, and MLP 16-6) showed comparable performance in ROC and precision-recall curves. Ensemble methods did not significantly improve performance. Compared to conventional FSC-A/SSC-A gating, the MLP model demonstrated substantially higher precision and recall for live cell identification. The conventional gating method resulted in significantly lower precision and recall for live cells, particularly gates A and B, and a very low recall (25.5%) for gate C despite a relatively high precision (93.9%). Unsupervised learning (K-means and Gaussian mixture clustering) performed poorly compared to the supervised learning models, highlighting the importance of using the cell state labels for accurate prediction. When expanding to a three-class model (live, early apoptotic, late apoptotic), the best performing MLP and random forest models showed good predictive performance for live and late apoptotic cells, but poor performance for early apoptotic cells. This is likely due to class imbalance and the transitional nature of early apoptotic cells.
Discussion
The study successfully demonstrated the feasibility of using cell morphology features derived from standard flow cytometry data for accurate and automated cell state classification. The MLP-based model significantly outperformed conventional gating methods, offering a robust, reliable, stain-free, and cost-effective approach. The results highlight the potential of machine learning for improving the efficiency and accuracy of flow cytometry-based assays. The superior performance of the supervised learning approach compared to unsupervised learning techniques further emphasizes the importance of utilizing labeled data for accurate cell state prediction. The limitations in the three-class model highlight the challenges in classifying early apoptotic cells, potentially due to their morphological similarity to both live and late apoptotic cells.
Conclusion
This study presents a novel MLP-based machine learning model for classifying live and apoptotic cells using only FSC and SSC data from flow cytometry. This model significantly outperforms traditional gating methods, offering a convenient, accurate, and cost-effective alternative for various flow cytometry applications. Future work could explore the use of additional features, different machine learning algorithms, and the application to diverse cell types and experimental conditions. Further investigation into improving the prediction of early apoptotic cells is warranted.
Limitations
The study focused on a single cell line (HCT116) treated with a specific apoptosis inducer (miR-34a). The generalizability of the model to other cell types and apoptosis induction methods needs further investigation. The three-class model also demonstrated limitations in accurately predicting early apoptotic cells, highlighting a potential area for future improvements. The reliance on Annexin V/PI staining for training data might limit the complete stain-free nature of the method, although the model itself performs classification without staining.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny