Medicine and Health

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

B. P. Smith, L. S. Auvil, et al.

This groundbreaking research by Brandi Patrice Smith, Loretta Sue Auvil, Michael Welge, Colleen Bannon Bushell, Rohit Bhargava, Navin Elango, Kamin Johnson, and Zeynep Madak-Erdogan identifies early exposure gene signatures for liver toxicity using advanced machine learning techniques. The study discovered ten high-accuracy gene biomarkers which could revolutionize and expedite toxicity testing for agrochemicals and pharmaceuticals.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the need for early, reliable biomarkers to predict liver toxicity from chemical exposures, a key bottleneck in agrochemical and pharmaceutical development. Despite extensive animal testing, human toxicity can still emerge late, underscoring the need for predictive toxicogenomics. Traditional differential expression analyses yield thousands of genes, which are impractical for testing and often perform poorly due to systematic noise, high dimensionality (p >> n), and overfitting. The authors aim to construct and validate a supervised machine learning framework to identify a small, robust set of early-exposure hepatic gene biomarkers that predict liver necrosis, an endpoint associated with eventual carcinogenicity. They use rat TG-GATEs data for feature selection and model training and independently validate performance using the MAQC-II NIEHS dataset.

Literature Review

The paper situates its work within toxicogenomics and machine learning literature, noting prior successes and persistent limitations. Supervised models have identified discriminative gene signatures across platforms, yet predictive performance often suffers from experimental noise, large signatures, and poor validation. Prior efforts in predictive toxicology and biomarker discovery are referenced, as are methodological advances for big data, feature selection, and cross-platform reproducibility (e.g., MAQC). The authors argue for integrative pipelines combining feature selection (filter, wrapper, embedded methods) with robust validation to overcome instability and overfitting in high-dimensional microarray data.

Methodology

Data sources: (1) TG-GATEs rat in vivo liver microarray dataset (Affymetrix Rat 230 2.0; 31,099 probes/genes), male F344 rats exposed to 42 compounds at control/low/middle/high doses across timepoints: single-dose (3, 6, 9, 24 h) and repeat-dose (4, 8, 15, 29 days). (2) Independent validation: MAQC-II NIEHS rat liver dataset (Affymetrix Rat 230 2.0; 418 rats; eight hepatotoxicants) for necrosis prediction. Normalization and initial reduction: Raw CEL files normalized with RMA (Bioconductor affy), then centered and scaled. Differential expression analysis using limma with design matrices contrasting high vs control dose; empirical Bayes statistics with p < 0.05 to derive an initial DE gene list. Hierarchical clustering (Cluster3/Treeview), PCA (StrandNGS), and GSEA were used to evaluate temporal patterns and functional enrichments. Dose/time selection: Ethinyl estradiol (EE) used to select an early informative timepoint. Based on clinical pathology and gene expression kinetics, 24 h high-dose exposure showed distinct transcriptional programs (cluster C6; chromatin/DNA binding), and was chosen for downstream feature selection to capture early signals. Feature selection: Three categories were applied to 24 h data for the 42 necrosis-inducing compounds: filter (Mann–Whitney, t-test, distance correlation [DCor]), wrapper (Boruta; recursive feature elimination with RF and with linear SVM), and embedded (Random Forest, Elastic Net, Lasso, RidgeCV, linear SVM). Parameters included, e.g., Boruta (perc=100, max_iter=100, n_estimators=15,000, max_depth=6), Lasso/ElasticNet (alpha 0.001–0.01, l1_ratio=0.5, max_iter=20,000), RF (up to 10,000 trees for selection; 1,000 trees, max_depth=4 for prediction), SVC (linear kernel, C=1). Models were evaluated as the number of features increased from 1 to 100; performance generally declined after 20–25 features, so top-10 features were selected per method. Classification models: Logistic Regression, Random Forest, linear SVM, Lasso, and ElasticNet classifiers were trained. Cross-validation and validation: Nested/outer cross-validation with compounds grouped in the same fold to avoid leakage; performance assessed on unseen folds. Hyperparameters tuned via GridSearchCV (e.g., SVC C, Lasso/ElasticNet alpha) with MAQC-II as a surrogate for pre-tuning while preserving independent validation. Primary metrics: AUC (ROC), F1 score, sensitivity/specificity, and Matthews correlation coefficient (MCC); ROC curves were generated. Independent validation on MAQC-II NIEHS measured generalizability. Visualization in Tableau. Software: scikit-learn, scipy, BorutaPy; R/Bioconductor (affy, limma); Cluster3, Treeview; GSEA; StrandNGS; GraphPad Prism.

Key Findings

- A compact 10-gene signature derived from supervised feature selection accurately predicts liver necrosis from early (24 h) exposure gene expression in rats and generalizes to an independent dataset (MAQC-II). - The 10-gene panel (from best-performing methods) includes: Scly, Dcd, RGD1309534, Slc23a1, Bhmt2, Tkfc, Srebf1, Ablim3, Cyp39a1, and Car3 (gene identities across methods overlapped; core contributors across Mann–Whitney, DCor, Boruta included Scly, Slc23a1, Dcd, Tkfc, RGD1309534). Many are involved in metabolism/detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, Tkfc) or transcriptional regulation (Ablim3), with several implicated in liver carcinogenesis (Crat, Car3, Slc23a1). - Performance (MAQC-II independent validation) for top combinations: • Mann–Whitney + Random Forest: F1 ≈ 0.91, ROC AUC ≈ 0.91, sensitivity ≈ 0.85, specificity ≈ 0.97, MCC ≈ 0.83. • Mann–Whitney + SVM: F1 ≈ 0.89, ROC AUC ≈ 0.90, sensitivity ≈ 0.88, specificity ≈ 0.91, MCC ≈ 0.79. • Boruta + Random Forest: F1 ≈ 0.89, ROC AUC ≈ 0.90, sensitivity ≈ 0.79, specificity ≈ 1.00, MCC ≈ 0.81. • DCor + Random Forest: F1 ≈ 0.89, ROC AUC ≈ 0.90, sensitivity ≈ 0.82, specificity ≈ 0.97, MCC ≈ 0.80. - Model performance generally declined beyond 20–25 features; selecting the top 10 features balanced accuracy and parsimony, reducing overfitting risk. - EE case study supported choosing 24 h as an early, informative timepoint based on distinct transcriptional programs and early clinical pathology changes. - The pipeline provides high specificity/selectivity predictors for necrosis that are suitable for practical screening.

Discussion

The findings demonstrate that a small, early-exposure hepatic gene signature can robustly predict necrosis, addressing the need for early biomarkers to streamline toxicity assessment. By integrating filter, wrapper, and embedded feature selection with rigorous cross-validation and independent validation, the study mitigates p >> n challenges and overfitting common in microarray-based predictive modeling. Performance metrics (AUC, F1, MCC) indicate strong discriminative ability across multiple classifier combinations, with consistent top contributing genes across selection methods, enhancing biological plausibility. The selection of the 24 h timepoint is supported by EE-induced transcriptional signatures and early changes in clinical parameters, aligning the biomarker window with biologically meaningful early events. The approach and gene panel are positioned to accelerate toxicity screening and may be adaptable to additional apical endpoints beyond necrosis.

Conclusion

The study delivers an integrative machine learning pipeline and a concise 10-gene signature that predict rat liver necrosis from 24 h exposure gene expression with high accuracy and generalizability to an independent dataset. The genes predominantly reflect metabolic/detoxification and regulatory processes, with links to liver carcinogenesis. Contributions include: (1) a validated, low-dimensional biomarker set; (2) a comparative framework combining multiple feature selection and classification methods; and (3) independent validation via MAQC-II. Future work could extend this approach to other liver injury phenotypes and species, assess transferability across platforms (e.g., RNA-seq), and evaluate translational applicability for human risk assessment and regulatory testing.

Limitations

- The study focuses on male rat in vivo datasets and the Affymetrix Rat 230 2.0 microarray platform; generalization to other species, sexes, and technologies requires further validation. - Biomarker discovery and modeling were centered on a single early timepoint (24 h) and one apical endpoint (necrosis); applicability to other timepoints and endpoints remains to be tested. - Although normalization (RMA) and independent validation were used to mitigate batch and study effects, residual technical or study-specific biases cannot be fully excluded. - The compounds analyzed, while diverse, are limited to those available in TG-GATEs and MAQC-II; broader chemical space coverage would strengthen generalizability.

Related Publications

Explore these studies to deepen your understanding of the subject.

Linguistics and Languages

Multi-class identification of tonal contrasts in Chokri using supervised machine learning algorithms

A. Gope, A. Pal, et al.

Engineering and Technology

Accelerated identification of equilibrium structures of multicomponent inorganic crystals using machine learning potentials

S. Kang, W. Jeong, et al.

Computer Science

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

D. Rankin, M. Black, et al.

Medicine and Health

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

S. Bej, J. Sarkar, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny