logo
ResearchBunny Logo
Al-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases

Medicine and Health

Al-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases

J. S. Iyer, D. Juyal, et al.

Discover the transformative potential of AIM-MASH, an innovative AI-based tool designed to enhance histologic scoring in metabolic dysfunction-associated steatohepatitis (MASH) clinical trials. Developed by a team of experts, AIM-MASH not only achieves reproducible predictions but also aligns closely with consensus scores, reducing inter-rater variability and providing a more sensitive measure of patient responses.

00:00
00:00
Playback language: English
Introduction
Metabolic dysfunction-associated steatohepatitis (MASH), formerly nonalcoholic steatohepatitis (NASH), is a progressive liver disease frequently leading to cirrhosis and hepatocellular carcinoma. It's a leading cause of liver transplantation in the US, with increasing incidence and associated medical and economic burdens. While histologic surrogate endpoints are used in MASH clinical trials for enrollment and endpoint assessment, variability in manual assessment of histology-based endpoints significantly impacts trial outcomes. This variability stems from the limited sensitivity of scoring systems and the inherent subjectivity of manual histological assessment. Such inaccuracies can affect observed treatment responses, trial safety, study population identification, and participant inclusion/exclusion, potentially leading to clinical trial failure. The FDA and EMA have issued guidance on using histopathologic assessment of liver biopsies as clinical trial inclusion criteria and endpoints. The MASH Clinical Research Network (CRN) recommends measuring macrovesicular steatosis, lobular inflammation, hepatocellular ballooning, and fibrosis. However, even with efforts to harmonize scoring guidelines, inter-pathologist variability remains high, affecting the power of trials to detect significant drug effects. Advances in artificial intelligence (AI) offer potential solutions for accurate, quantitative, and reproducible assessment of digitized pathology whole-slide images (WSIs). This study presents an AI-powered digital pathology tool, AIM-MASH, to quantify relevant histological tissue features and improve clinical trial reliability.
Literature Review
The literature highlights the significant challenges posed by inter- and intra-observer variability in the histological assessment of NASH. Studies have shown suboptimal reliability of liver biopsy evaluation, impacting the results of randomized clinical trials. The FDA and EMA have issued guidelines for drug development and endpoints in NASH, emphasizing the importance of standardized histologic assessment. However, the lack of standardization and the high variability in interpretation, even among experts, remain major hurdles. Previous research has explored using machine learning approaches for quantitative measurement of liver histology and disease monitoring in NASH, showing promising results in improving accuracy and reproducibility. This forms the basis for the current study's exploration of AI-based tools to address the limitations of manual histological assessment in NASH clinical trials.
Methodology
The AIM-MASH tool consists of multiple convolutional neural networks (CNNs) and graph neural networks (GNNs) that generate various histologic readouts. CNN-based AI tissue, artifact, and fibrosis models were trained using 103,579 pathologist-provided annotations from 8,747 H&E and 7,660 Masson's trichrome (MT) WSIs from six completed phase 2b and phase 3 MASH clinical trials. These models segmented relevant histological features for pixel-level mapping and slide-level feature quantification. GNN-based models used CNN-derived outputs to predict MASH CRN ordinal grades or stages and continuous scores for each histologic feature. The CNNs segmented various features including macrovesicular steatosis, hepatocellular ballooning, lobular inflammation, portal inflammation, microvesicular steatosis, interface hepatitis, and normal hepatocytes (H&E), and large intrahepatic septal and subcapsular regions, pathologic fibrosis, and bile ducts (MT). An artifact model detected and excluded image and tissue artifacts. GNNs predicted MASH CRN scores and continuous scores. To correct for pathologist bias, GNNs were specified as 'mixed effects' models, learning biases and using only unbiased estimates for predictions. Model performance was assessed using a mixed leave-one-out (MLOO) approach, comparing model predictions with pathologist consensus scores. The clinical utility was assessed using WSIs from completed phase 2b trials to evaluate enrollment criteria and endpoints. A retrospective analysis of the ATLAS trial evaluated drug efficacy using AIM-MASH. A continuous scoring system was developed to detect subordinal changes within ordinal bins. Correlations between continuous scores and pathologist scores, non-invasive biomarkers, and clinical outcomes were analyzed. Kaplan-Meier and Cox proportional hazards regression analyses were used to assess the prognostic utility of continuous scoring for predicting disease progression.
Key Findings
AIM-MASH demonstrated perfect repeatability (κ = 1) in initial testing. The model versus consensus agreement rates were comparable to inter-pathologist agreement rates for all four histologic features (steatosis κ = 0.74; ballooning κ = 0.70; lobular inflammation κ = 0.67; fibrosis κ = 0.62). Agreement between AIM-MASH and consensus was superior to individual pathologist versus consensus agreement and mean pairwise pathologist agreement. In evaluating clinical trial enrollment criteria, AIM-MASH versus consensus agreement for MAS ≥ 4 versus < 4 was 0.82 (95% CI 0.79-0.85), comparable to the average pathologist versus consensus (0.81, 95% CI 0.78-0.83). For fibrosis stages F1-F3 versus F4, model versus consensus agreement was 0.97 (95% CI 0.95-0.98), similar to the average pathologist (0.96, 95% CI 0.95-0.97). In evaluating clinical trial endpoints, AIM-MASH's agreement with consensus was comparable to that of the average pathologist. In a retrospective analysis of the ATLAS trial, AIM-MASH detected a greater proportion of responders in the treatment arm for all three endpoints compared to the central reader assessment. The AIM-MASH-based continuous MASH CRN fibrosis score was more sensitive to treatment-induced changes than the conventional continuous CPA and was a stronger predictor of progression to cirrhosis and liver-related complications than AI-based ordinal MASH CRN grades/stages. Continuous scores significantly correlated with mean pathologist scores and non-invasive biomarkers. The continuous fibrosis score stratified patients with stage 3 or 4 fibrosis into slow versus rapid progressors.
Discussion
AIM-MASH provides a robust and reproducible method for assessing MASH histology in clinical trials, overcoming the limitations of inter-observer variability inherent in manual scoring. The high concordance with expert pathologist consensus validates the accuracy and reliability of AIM-MASH. The ability to detect treatment response with greater sensitivity than manual methods has significant implications for clinical trial design and the development of effective NASH therapeutics. The continuous scoring system offers a more granular assessment of disease progression and regression, providing a more sensitive measure of treatment effects. The strong correlation with non-invasive biomarkers further enhances the clinical utility of AIM-MASH. Integrating AIM-MASH into clinical trial workflows has the potential to improve patient selection, endpoint assessment, and the overall success rate of NASH clinical trials. This could lead to faster development and approval of effective therapies, benefiting patients with NASH.
Conclusion
AIM-MASH offers a significant advancement in the assessment of MASH histology, providing a robust, reproducible, and sensitive method for evaluating disease severity and treatment response. Its superior performance compared to manual scoring holds promise for improving the design and interpretation of clinical trials, accelerating the development of effective NASH therapies. Future research should focus on validating AIM-MASH in prospective clinical trials and exploring the clinical implications of the continuous scoring system.
Limitations
The study utilized retrospective data from completed clinical trials. Prospective validation in diverse patient populations is necessary to confirm the generalizability of the findings. The continuous scoring system, while promising, requires further research to define clinically meaningful thresholds and to optimize its mapping to the underlying biological processes of NASH progression and regression. The reliance on existing clinical trial datasets may limit the diversity of patient populations and disease severities represented in the study.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny