Introduction
Long-acting injectables (LAIs) offer significant advantages for treating chronic diseases due to improved efficacy, safety, and patient compliance. Polymer-based LAIs provide exceptional versatility, but predicting their performance is challenging due to the complex interplay of various parameters. Traditional LAI development relies on extensive in vitro experimentation, which is time-consuming and expensive. This research investigates the application of machine learning (ML) to accelerate this process. The study hypothesizes that ML algorithms can accurately predict in vitro drug release from polymer-based LAIs and that these predictive models can be used to guide the rational design of new, improved formulations. The successful implementation of this data-driven approach holds the potential to significantly reduce the time and cost associated with the development of LAIs, ultimately facilitating faster translation of promising therapeutic agents from the bench to the bedside. The limited number of FDA-approved polymeric LAIs compared to conventional formulations underscores the need for innovative approaches to expedite the development process. While mathematical models and molecular dynamics simulations have been explored, they suffer from limitations such as poor analysis of in vitro drug release profiles and high computational costs, respectively. Existing ML applications in this area are also limited by small datasets and narrow application domains. This study aims to address these limitations by leveraging a more comprehensive dataset and employing a wider range of ML algorithms.
Literature Review
The existing literature highlights challenges in developing and translating polymeric LAIs into clinical practice. A limited number of biodegradable polymers are considered safe (GRAS) for parenteral administration, with poly(lactide-co-glycolide) (PLGA) being predominantly used. The compatibility between the polymer and drug significantly influences the LAI's performance, including drug loading, release, and stability. Mathematical models for predicting drug release are hindered by the difficulty in analyzing in vitro release profiles and lack of a priori information. Molecular dynamics simulations offer insights into the relationship between drug release rates and formulation parameters but are computationally expensive and cannot replace experimental assays. Machine learning (ML) offers a potential solution, but previous applications have been limited by small datasets and a reliance on neural networks, which can overfit with limited data. This study aims to address these limitations by using a larger, more comprehensive dataset and exploring a wider range of ML algorithms.
Methodology
The researchers constructed a dataset from previously published studies and external sources using the Web of Science search engine. The dataset included descriptors for small molecule drugs, polymer materials, and LAIs, along with experimental conditions and drug release profiles (378 measurements for 43 drug-polymer combinations). Seventeen physicochemical descriptors were selected as input features, encompassing drug and polymer properties as well as experimental parameters. A nested cross-validation strategy was employed for model training and evaluation, with an inner loop for hyperparameter tuning and an outer loop for model evaluation. Seven ML models were compared: Light Gradient Boosting Machine (LGBM), Random Forest (RF), XGBoost, Decision Tree (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Neural Network (NN). Model performance was assessed using Mean Absolute Error (MAE). Following initial model selection, feature engineering was performed using agglomerative hierarchical clustering to identify and remove redundant features from the LGBM model, optimizing its performance and interpretability. SHAP (SHapley Additive exPlanations) analysis was utilized to understand feature importance and model predictions. Finally, a prospective study was conducted to validate the model by designing and testing "fast" and "slow" release formulations based on the insights gained from the SHAP analysis. The study utilized a combination of statistical techniques, including Spearman’s Rank Correlation, Ward’s Linkage Distance, t-Distributed Stochastic Neighbor Embedding, Principal Component Analysis, and HPLC for drug quantification and characterization.
Key Findings
The LGBM model consistently outperformed other models in predicting fractional drug release, achieving a MAE < 0.6. The addition of initial drug release measurements significantly improved model accuracy. Feature engineering reduced the number of input features from 17 to 15 without compromising prediction accuracy. SHAP analysis revealed that Time and T=1.0 (fractional drug release at 1 day) were the most influential features, with drug and polymer molecular weights also playing significant roles. The prospective study successfully designed and tested "fast" and "slow" release PLGA formulations based on the model's predictions, demonstrating good agreement between predicted and experimental release profiles. However, discrepancies were observed for a slow-release formulation, highlighting the need for more data on this type of formulation and the potential integration of PLGA hydrolysis information into the model. The study demonstrates the successful application of the LGBM model to predict and guide the design of novel LAI formulations. The study's dataset is available on Zenodo and ChemRxiv. The code supporting the study is on the Aspuru-Guzik Group's GitHub page and Zenodo.
Discussion
This study successfully demonstrates the potential of machine learning, specifically the LGBM model, to significantly accelerate the development of long-acting injectables. The high accuracy of the model in predicting in vitro drug release, coupled with the insightful feature importance analysis provided by SHAP, offers a powerful tool for rational drug formulation design. The ability to guide the selection of polymer molecular weight, drug molecular weight, and other key parameters a priori represents a significant step forward. The prospective study further validates the model's predictive capabilities and its ability to direct the development of formulations with desired release profiles. The limitations identified, particularly the need for more data on slow-release formulations and the inclusion of PLGA hydrolysis kinetics, provide valuable directions for future research. The study’s open-source dataset and code contribute to the broader advancement of ML applications in pharmaceutical sciences.
Conclusion
This research establishes a robust machine learning framework for accelerating the design of polymeric long-acting injectables. The LGBM model's high predictive accuracy and interpretability, demonstrated through SHAP analysis and a successful prospective study, highlight its potential to significantly reduce development time and costs. Future work should focus on expanding the dataset to encompass a wider range of polymers, drug molecules, and release mechanisms, as well as incorporating factors like PLGA hydrolysis. This would enhance model accuracy and generalizability, further solidifying the role of ML in data-driven drug formulation development.
Limitations
The study is limited by the size of the dataset, which although larger than previous studies, might not fully capture the complexity of all possible drug-polymer interactions. The focus on in vitro release profiles also means that in vivo performance needs further validation. The model's accuracy might be affected by the accuracy of the input data, and the inclusion of PLGA hydrolysis kinetics could improve its predictions for slow-release formulations.
Related Publications
Explore these studies to deepen your understanding of the subject.