Introduction
Cellular senescence, a state of permanent cell cycle arrest, is implicated in aging and numerous diseases, including cancer, type-2 diabetes, and osteoarthritis. While senescence has beneficial roles in development and wound healing, senescent cells also contribute to age-related pathologies through the senescence-associated secretory phenotype (SASP). The SASP involves the secretion of inflammatory molecules that can promote tumorigenesis and other age-related diseases. Consequently, there is considerable interest in developing senolytics—therapeutic agents that selectively eliminate senescent cells. However, the discovery of new senolytics has been hampered by the limited understanding of the molecular pathways involved and the high cost of traditional drug screening methods. This study explores the use of machine learning to accelerate senolytic discovery, leveraging existing published data to identify potential drug candidates and significantly reducing the reliance on costly and time-consuming experimental screening. The use of machine learning and AI is becoming increasingly common in drug discovery, offering a potential route to improving the efficiency and effectiveness of the drug development process.
Literature Review
Several senolytics have been identified, including Bcl-2 family inhibitors like navitoclax and ABT-737, which target anti-apoptotic proteins upregulated in senescence. Others, such as cardiac glycosides (ouabain, digoxin) and BET inhibitors (ARV825, JQ1), have been discovered through large-scale screening. However, many senolytics exhibit cell-type-specific action and toxicity towards non-senescent cells, limiting their clinical applicability. Machine learning has been applied in geroprotector and anti-senescence compound discovery, but often relying on large datasets or specific target-oriented approaches. This study aimed to overcome these limitations by using a machine learning model trained on a relatively small, yet diverse, dataset of known senolytics and non-senolytics to identify new candidates. The selection of training compounds prioritised consistency in experimental validation across diverse cell types and senescence-induction methods rather than a larger, less consistent dataset.
Methodology
A dataset of senolytics (positive) and non-senolytics (negative) was assembled from published literature and a commercial patent. This dataset, comprising 58 positives and 2465 negatives, was carefully curated to ensure diversity and represent the low probability of a randomly chosen compound exhibiting senolytic activity. Chemical structures were converted into numerical representations using 200 physicochemical descriptors generated with the RDKit package. The diversity of the positive compounds in the training set was rigorously assessed using k-means clustering, Tanimoto distance graphs, and community detection to ensure that it spanned a representative chemical space and did not unduly bias the training toward specific chemical classes. Support Vector Machines (SVM), Random Forests (RF), and an XGBoost ensemble model were trained on this data to predict senolytic activity. Feature selection was implemented to reduce the number of features to the most relevant 165 features in the prediction task. Five-fold cross-validation was employed to assess model performance using precision, recall, and F1 scores. The XGBoost model, demonstrating superior performance (especially a lower false positive rate), was selected for a computational screen of the L2100 TargetMol Anticancer and L3800 Selleck FDA-approved & Passed Phase chemical libraries (a total of 4340 compounds). The 21 compounds with the highest predicted probability of senolytic action were identified for experimental validation. The experimental validation was conducted using two cellular models of senescence: oncogene-induced senescence (OIS) in IMR90 ER:RAS cells and therapy-induced senescence (TIS) in A549 cells. Cell survival assays using Hoechst staining and crystal violet staining were performed to assess senolytic activity, with ouabain and navitoclax serving as positive controls. Caspase-3/7 activity assays were also conducted to assess apoptosis induction. Intracellular potassium levels and NOXA mRNA expression were measured to explore the mechanism of action of the identified senolytics.
Key Findings
Three compounds—ginkgetin, oleandrin, and periplocin—were experimentally validated as senolytics in both OIS and TIS models. These compounds demonstrated senolytic activity comparable to or exceeding that of known senolytics, with oleandrin showing particularly high potency, especially in the low nanomolar range. The XGBoost model showed exceptional selectivity, identifying only 21 potential senolytic candidates from a screening library of more than 4000 compounds (0.4%). The identified compounds exhibited a selective senolytic effect, showing significantly reduced toxicity in non-senescent cells compared to senescent cells. Oleandrin, in particular, demonstrated superior potency compared to ouabain, a known senolytic cardiac glycoside, in both oncogene-induced and replicative senescence. It also induced a significant increase in caspase-3/7 activity and NOXA mRNA expression, a pro-apoptotic protein involved in ouabain's senolytic mechanism, at concentrations where ouabain showed little or no senolytic effect. Importantly, oleandrin did not inhibit cell proliferation at the concentrations used. Further analysis suggested that the structural features beyond the core steroid structure of oleandrin, periplocin, and ouabain, as well as the biflavone structure of ginkgetin, contributed to their enhanced senolytic efficacy.
Discussion
This study successfully demonstrates the potential of machine learning to accelerate drug discovery, particularly in areas with limited known targets and a scarcity of validated compounds. The use of a relatively small, diverse training dataset and a target-agnostic approach overcame the limitations of previous studies. The high hit rate of the experimental validation (14.28% out of the 21 compounds selected for testing) validated the predictive power of the XGBoost model and its ability to successfully differentiate true senolytics from a large pool of negative compounds. The superior performance of oleandrin over existing senolytic cardiac glycosides highlights the potential for the approach to identify novel compounds with improved efficacy. The identification of natural products as senolytics offers translational advantages, including existing safety and pharmacokinetic data. The low cost and high throughput of the in silico screening represent a considerable improvement over conventional methods, paving the way for more efficient and cost-effective drug discovery.
Conclusion
This research successfully employed machine learning to identify three novel senolytics (ginkgetin, oleandrin, and periplocin). The study’s findings highlight the potential for applying machine learning approaches to address challenges in senolytic drug discovery, emphasizing its capacity to effectively screen vast chemical libraries. The superior potency of oleandrin, compared to other known senolytics, warrants further investigation into its potential as a therapeutic agent, especially given its potential for local administration to mitigate systemic toxicity concerns. Future research should focus on in-depth mechanistic studies and preclinical evaluations of these compounds to assess their therapeutic potential in various age-related diseases. Furthermore, exploring alternative machine learning models and expanding the training datasets could further refine the accuracy and efficiency of the in silico screening approach.
Limitations
The study's reliance on published data for model training, while cost-effective, means the quality of predictions depends on the completeness and accuracy of the literature. The inherent bias in the training data due to the scarcity of known senolytics and the lack of negative data from extensive experimental screens could have affected model performance and generalizability. Additional in-vivo studies are needed to further evaluate the efficacy and safety of the identified compounds before their clinical application. Finally, the high selectivity of the XGBoost model, while efficient, may result in overlooking potentially useful compounds that do not satisfy the stringent selection criteria.
Related Publications
Explore these studies to deepen your understanding of the subject.