Introduction
Androgen deprivation therapy (ADT) is a cornerstone treatment for advanced prostate cancer, but predicting its efficacy and prognosis using conventional clinical parameters (PSA, Gleason score, TNM stage) remains challenging. Existing models often yield C-indices below 0.7, highlighting the need for improved predictive tools. Genetic factors are suspected to influence ADT response, with variations in outcomes observed across ethnicities and within families. Genome-wide association studies (GWAS) have identified potential single nucleotide polymorphisms (SNPs) associated with prostate cancer prognosis, but their predictive power alone is limited. Machine learning (ML), with its ability to identify complex patterns in high-dimensional data, offers a promising approach to integrate genetic and clinical information for more accurate prediction. This study aimed to develop and validate ML models using clinical and genetic data to predict castration resistance in patients with advanced prostate cancer undergoing primary ADT. The study utilized data from the KYUCOG-1401-A study, a prospective multi-institutional clinical trial in a Japanese population, to build and validate predictive models. The integration of clinical data and SNPs identified through previous GWAS within the study population promised to enhance the accuracy of castration resistance prediction and aid in treatment strategy optimization.
Literature Review
Numerous studies have explored the prognostic value of clinical parameters in advanced prostate cancer treated with ADT. However, these models have demonstrated limited predictive accuracy, typically resulting in C-indices below 0.7. This inadequacy emphasizes the need to incorporate additional factors, such as genetic information, to improve prognostication. Previous research has suggested the influence of genetic background on the efficacy and prognosis of ADT, with varying outcomes noted among different ethnic groups and within families. Genome-wide association studies (GWAS) have proven useful in identifying SNPs associated with various traits, including cancer-related outcomes. A prior study by the research team investigated the association between SNPs and prognosis in Japanese patients undergoing primary ADT, identifying specific SNPs linked to prognosis. Yet, the predictive ability of these SNPs alone remained unsatisfactory. The integration of ML techniques has gained traction in oncology, showcasing its potential for leveraging large, complex datasets to develop more sophisticated prediction models. ML algorithms excel at identifying intricate patterns and interactions within high-dimensional data, making them ideal for analyzing genomic and clinical data jointly.
Methodology
This study employed data from the KYUCOG-1401-A study (UMIN000022852), a sub-study of KYUCOG-1401, which included Japanese patients with de novo advanced prostate cancer (TanyN1M0 or TanyNanyM1) receiving primary ADT. Patients with data censored before 2 years were excluded, resulting in a total of 119 participants. The dataset included both clinical and genetic information. Clinical data comprised clinicopathological characteristics, treatment response data, and survival data (progression-free survival (PFS), cancer-specific survival (CSS), overall survival (OS)). Progression was defined as PSA progression or radiographic progression. Genetic data consisted of single nucleotide polymorphisms (SNPs) obtained through genotyping using a Japonica Array v2. The study specifically used 2 and 46 SNPs previously associated with PSA-PFS at 2 years (p<1.0×10-5 and p<1.0 x 10-4, respectively). Data were randomly split into discovery (n=82) and validation (n=37) cohorts. Three machine learning (ML) algorithms were employed: point-wise linear (PWL), logistic regression with elastic-net regularization (LR), and eXtreme Gradient Boosting (XGBoost). Three datasets were used: clinical data only; clinical data + 2 SNPs; and clinical data + 46 SNPs. The PWL algorithm, a deep learning-based approach, was selected for model creation based on its superior performance (AUC) in the discovery and validation cohorts. Feature importance was assessed using weight vectors from the PWL algorithm. Model performance was evaluated using the AUC for predicting castration resistance at 2 years and the C-index for survival analyses. Statistical analysis included the Chi-square test, Kaplan-Meier method, log-rank test, and Harrell's C-index. Allele frequencies for SNPs from the 1000 Genomes Project were used to explore the potential impact of ethnicity on model performance.
Key Findings
The PWL algorithm consistently exhibited the highest AUC values across the three datasets in both the discovery and validation cohorts, except for the clinical-only model in the discovery cohort. The AUCs for predicting castration resistance at 2 years were: 0.773 (clinical), 0.810 (clinical + 2 SNPs), and 0.988 (clinical + 46 SNPs) in the discovery cohort; and 0.786 (clinical), 0.878 (clinical + 2 SNPs), and 1.000 (clinical + 46 SNPs) in the validation cohort. Three predictive models were constructed using the PWL algorithm: a clinical model, a small SNPs model (clinical data + 2 SNPs), and a large SNPs model (clinical data + 46 SNPs). The clinical model identified 12 key clinical parameters associated with castration resistance, including known factors like Gleason score and PSA, as well as additional factors like hypertension and total cholesterol. The small SNPs model included 6 clinical parameters and 2 SNPs, while the large SNPs model included 4 clinical parameters and 19 SNPs. Survival analysis demonstrated that the large SNPs model provided the strongest stratification for PFS, CSS, and OS, with C-indices exceeding 0.7 for all three. The C-indices were: 0.617 (PFS), 0.678 (CSS), 0.636 (OS) for the clinical model; 0.727 (PFS), 0.670 (CSS), 0.621 (OS) for the small SNPs model; and 0.730 (PFS), 0.781 (CSS), 0.703 (OS) for the large SNPs model. In contrast, the J-CAPRA risk model showed lower C-indices (0.588, 0.602, and 0.528 for PFS, CSS, and OS, respectively). Analysis of allele frequencies from the 1000 Genomes Project showed that the estimated effect of the 19 SNPs differed significantly between East Asians and Europeans, suggesting that genetic background influences the response to ADT.
Discussion
This study demonstrates the significant improvement in the prediction of castration resistance and prognosis in advanced prostate cancer by incorporating genetic information into ML models. The large SNPs model, utilizing 19 SNPs and 4 clinical parameters, achieved C-indices exceeding 0.7 for PFS, CSS, and OS, significantly surpassing the predictive power of previously established clinical models. This improvement highlights the substantial contribution of genetic factors in modulating the response to ADT. The observation that the inclusion of even a small number of SNPs enhanced predictive performance underscores the potential benefits of integrating genomic data into personalized treatment strategies. The ethnic differences observed in allele frequencies and their estimated effect on ADT response further emphasize the importance of considering population-specific factors when developing and applying these predictive models. The identification of specific SNPs and their potential association with genes involved in androgen synthesis pathways sheds light on the underlying biological mechanisms influencing ADT response. The study's findings suggest that incorporating these genetic and clinical markers into clinical practice could assist in making more informed treatment decisions, potentially guiding the choice of intensive versus de-escalated therapy based on a patient's predicted response to ADT.
Conclusion
The study successfully developed and validated ML models that accurately predict castration resistance and prognosis in advanced prostate cancer undergoing primary ADT. The models incorporating SNPs significantly outperformed models using clinical data alone. These findings hold significant clinical implications for personalized medicine, enabling more tailored and effective treatment strategies. Further research is warranted to validate these findings in larger, more diverse populations and to explore the integration of these models into clinical decision support systems. Future studies could also investigate the interaction between genetic factors, clinical characteristics, and other treatment modalities to further refine the predictive power of these models and facilitate improved patient outcomes.
Limitations
The study's relatively small sample size may limit the generalizability of the findings to other populations. The models were developed and validated in a Japanese population; therefore, further validation is needed in other ethnic groups to confirm their robustness and applicability. The study focused on patients receiving primary ADT alone, and the predictive power of the models might differ in patients treated with combination therapies. Additional research is needed to assess the clinical utility of these models in guiding treatment decisions and to determine their cost-effectiveness.
Related Publications
Explore these studies to deepen your understanding of the subject.