
Veterinary Science
Machine learning and metagenomics reveal shared antimicrobial resistance profiles across multiple chicken farms and abattoirs in China
M. Baker, X. Zhang, et al.
This study, conducted by a team of researchers including Michelle Baker and Xibin Zhang, investigates antimicrobial resistance in large-scale chicken farms and abattoirs in China. Utilizing machine learning and metagenomics, they unveil crucial connections between mobile antibiotic resistance genes and environmental factors, paving the way for significant improvements in livestock health.
~3 min • Beginner • English
Introduction
Antimicrobial use in poultry production in China is five times higher than the international average. Antibiotic use, even at low levels, alters and expands the gut resistome in livestock, and the microbial community can shape antimicrobial resistance (AMR) phenotypes. External events such as changes in diet, temperature and stress may result in the colonization of new resident species or AMR transfer between species. Temperature, humidity and both bacterial species abundance and the presence of antibiotic resistance genes (ARGs) can influence bacterial infection in broilers. Links between environmental conditions and AMR are particularly relevant for China and low- and middle-income countries (LMICs), where maintaining stable environmental conditions in industrial-scale farming may be challenging compared with in high-income countries. AMR surveillance in non-healthcare domains has not been widely adopted, but is key to understanding how food production systems contribute to the selection and dissemination of antibiotic-resistant bacteria (ARB) and ARGs. Machine learning (ML) and big data mining offer tools to advance precision poultry farming. Culture-based approaches involving whole genome sequencing (WGS) of individual pathogens, antibiotic susceptibility testing and ML techniques are effective predictors of genomic characteristics linked to AMR for both Escherichia coli isolates and other bacteria. However, surveillance approaches focusing solely on WGS of individual pathogens may not capture the diversity of the microbial communities and resistomes within livestock production and ARG data may be missed. A recent proof-of-concept study observed that several ARGs present in the chicken faecal resistome correlated with the resistance/susceptibility profiles of E. coli isolates cultured from the same samples. In this study, the authors developed a reference method for metagenomic-based surveillance targeting Chinese livestock farming, considering typical laboratory resource constraints in China and LMICs. E. coli was used as an indicator species for AMR within the gut microbiome context, and the impacts of surrounding farm environments, barn temperature and humidity, and antimicrobial administration protocols were explored.
Literature Review
The paper situates its work within evidence that antimicrobial use reshapes livestock gut resistomes and that microbial community composition influences AMR phenotypes. Prior ML and culture/WGS studies have predicted AMR determinants in E. coli and other pathogens, but isolate-centric surveillance may miss broader resistome diversity in production systems. Environmental factors such as temperature and humidity have been linked to broiler infections and microbiome variation, with particular relevance in LMIC settings. Previous studies identified correlations between faecal resistome ARGs and E. coli phenotypic resistance, supporting the integrative metagenomic–phenotypic approach adopted here. The study also references frameworks for classifying clinically important ARGs and prior observations of ARG co-localization and mobility via MGEs in food animal contexts.
Methodology
Study design and sampling: Ten large-scale commercial poultry farms across three Chinese provinces (Shandong, Henan, Liaoning) feeding into four abattoirs were surveyed over one production cycle (two cycles at one Shandong farm for a pilot). Farms used either net or cage housing. Biological samples were collected at standardized time points: t1 (week 3) and t2 (week 6) from barns (pooled chicken faeces, pooled feathers, barn floor litter) and surrounding outdoor soil; and t3 (slaughter day, 1–5 days post-week 6) from abattoirs (carcasses, processing line surfaces, wastewater). In total, 461 metagenomes were analysed across birds, carcasses, and environmental sources. Environmental monitoring: Temperature and humidity within barns were recorded at 5-minute intervals via automated sensors where available (HN1–HN3, SD2–SD4), or manually (SD1, LN2, LN3) using SMART SENSOR AS837 devices; LN1 had no sensor data. Measurements were averaged across three in-barn locations and aggregated over the 7 days preceding t1 and t2. DNA extraction and sequencing: DNA from faeces, barn floor, and soil was extracted with a magnetic bead kit; carcass DNA used a CTAB-based method. Libraries (1 µg input, 350 bp fragmentation) were prepared using NEBNext Ultra for Illumina, size-checked (Agilent 2100) and quantified (qPCR), then sequenced on Illumina NovaSeq 6000 (150 bp paired-end). Bioinformatics: Raw reads were quality-filtered (Readfq). Host (chicken) reads were removed (Bowtie2, SAMtools; ref genome GCF_000002315.6). Assemblies were generated per sample (MEGAHIT default) and co-assemblies per source type with parameters --continue --kmin 1pass --min-contig-len 1000. Contigs >2,000 bp were mapped (BWA-MEM, SAMtools) to produce BAMs; coverage was estimated (MetaBAT2). Taxonomic profiling used MetaPhlAn 3 with Bowtie2. Community differences were analysed via NMDS (vegan, Bray–Curtis) and PERMANOVA; relative abundance comparisons used violin/scatter plots with Wilcoxon rank-sum tests (Holm correction). Resistome and MGE analysis: Assembled contigs were searched against CARD (BLASTn, identity ≥95%, coverage ≥95%) to enumerate ARGs. Source attribution of ARGs used correlation of ARG/contig normalized coverage with species abundance (Spearman, P<0.05 and Pearson r≥0.6). Potentially mobile ARGs were defined by co-location of ARGs and MGEs (ISfinder database) within ≤5 kb on contigs (>500 bp), annotated with Prokka; singletons were excluded. Clinically important ARGs followed Zhang et al. Risk I criteria (human-associated, potentially mobile, present in ESKAPE pathogens). Structures of selected MGE–ARG patterns were visualized (EasyFig). Phylogenetics: For ISAba125–blaNDM-1 contigs, Bayesian phylogeny was reconstructed using BEAST v1.10.4 under the best model (uncorrelated log-normal relaxed molecular clock with Bayesian skyline), with GTR+Γ substitution model (selected by IQ-TREE2). Three independent chains (100 million steps each) achieved ESS>200; convergence assessed in Tracer; maximum clade credibility tree visualized in iTOL. E. coli isolation and AST: E. coli was cultured from faecal samples corresponding to the metagenomes; 191/223 were positive, and 170 were retained for ML analyses (excluding LN1 lacking sensor data). AST against 26 antibiotics by broth microdilution was interpreted per CLSI standards. Machine learning pipeline: Implemented in Python (SciPy, scikit-learn), the pipeline had three phases. Phase I: feature pre-selection per antibiotic included min–max normalization, class balancing with SMOTE, removal of zero-variance features, chi-squared filtering (P≤0.01) to identify ARG counts and species relative abundances associated with resistance/susceptibility; network visualization used NetworkX with Kamada–Kawai layout. Phase II: multiple classifiers (logistic regression, linear/RBF SVM, Extra Trees, Random Forest, AdaBoost, XGBoost) were trained per antibiotic with nested cross-validation (outer 5-fold, inner 3-fold), repeated 30 times. Performance metrics included ROC-AUC, accuracy, sensitivity, specificity, precision; the Extra Trees classifier ranked best by Nemenyi test and was used to extract strongest predictors via Gini importance. Antibiotics lacking sufficient minority class samples (9 agents) were excluded from ML modelling. Phase III: linear regressions assessed associations between top predictors (from models with AUC>0.9) and barn temperature or humidity averages (7 days pre t1/t2). Significant correlations required non-zero slopes (P<0.05, Wald test) with R² as fit metric. ARG–species co-occurrence (origin inference) used read depth correlations to build undirected graphs of species and ARGs linked to temperature/humidity. Antibiotic use bias analysis: Farm-level use/non-use of antibiotic classes (tetracyclines, lincosamides, aminoglycosides, etc.) was compared against ARG relative abundances (class-wise ratios) and microbial species abundances using Wilcoxon rank-sum tests to identify associations with treatment practices.
Key Findings
Shared mobile ARGs across birds and environments: Across all sources, 661 distinct MGE–ARG combinations (covering 195 unique ARGs) were identified; 38% of ARGs appeared in one MGE–ARG combination and 62% in multiple (2–22). Over half (56%) of potentially mobile ARGs were present in more than one source. Chicken faeces had the highest and most variable counts of potentially mobile ARGs per sample; feathers and barn floor carried comparably high numbers, while outdoor soil, carcasses, processing line and wastewater carried fewer (significantly lower than faeces/feathers, Dunn’s test adjusted P<0.01). In total, 145 potentially mobile ARG patterns were shared between bird and environmental samples from the same farm; 46 of these contained clinically relevant ARGs. Clinically notable detections included blaNDM-5 in faeces, feathers and barn floor, and qnrS1 in faeces, feathers, barn floor and wastewater. Correlations between gut microbiome/resistome and E. coli AMR: From 170 E. coli isolates (tested against 26 antibiotics; all resistant to at least one, 169 to ≥3), 17 antibiotics had sufficient data for ML. Extra Trees classifiers achieved the best performance; 10 models (amikacin, aztreonam, cefoxitin, chloramphenicol, cefotaxime, kanamycin, nalidixic acid, streptomycin, sulfafurazole, trimethoprim–sulfamethoxazole) exceeded AUC>0.90. A core of 419 gut features (186 microbial species and 233 ARGs) strongly predicted E. coli resistance/susceptibility for these 10 antibiotics. Among the 233 ARGs, 24% were β-lactam, 18% aminoglycoside, and 18% MLSB-class; 46 ARGs were assigned to E. coli contigs, while 16 were assigned to non–E. coli species. Network analysis highlighted 66 ARGs (15 clinically relevant, including blaNDM-5, blaCTX-M-15, dfrA15, dfrA5) predicting resistance to more than three antibiotics; aphA6, vat(A), and vgb(A) predicted eight resistances each. Twenty-eight gut species predicted resistance to five antibiotics (aztreonam, chloramphenicol, cefotaxime, kanamycin, nalidixic acid), including Arcobacter, Acinetobacter and Sphingobacterium. SHAP analyses showed that across models, 41% of top features had presence positively associated with resistance prediction, while 59% had absence positively associated with resistance (notably nalidixic acid and streptomycin models). Environmental drivers: Of the top predictors, 130 ARGs and 48 species correlated with humidity; 39 ARGs and 20 species correlated with temperature. Humidity correlations exhibited stronger average R². ARG class composition among humidity-correlated ARGs: 22% MLSB, 18% β-lactam, 17% aminoglycoside, 11% tetracycline. Among temperature-correlated ARGs: 23% β-lactam, 18% MLSB, 15% aminoglycoside, 13% glycopeptide. Nineteen ARGs correlated with both temperature and humidity, including clinically relevant qnrA1, qnrS2, blaNDM-1, and catA8. Species correlated with both included Helicobacter pullorum, Alcaligenes faecalis, Bacillus cereus group, and Bacteroides stercoris; Mycoplasma yeatsii correlated with temperature only. ARG–species co-occurrence graphs revealed subgraphs associated with humidity (e.g., Klebsiella pneumoniae linked to kpnE, kpnF, kpnG, acrA; and A. faecalis linked to vga(C) and blaOXA-58) and temperature. Mobility context of key ARGs: Ten ARGs that were predictors of resistance and correlated with temperature/humidity were co-located near MGEs: optrA, mph(F), erm(X) (MLSB); blaNDM-1, blaOXA-58 (β-lactam); catA8, catB2 (phenicol); aadA1 (aminoglycoside); qnrS2, qnrA1 (fluoroquinolone). Some showed single-MGE associations (catB2–ISPa25, mph(F)–IS15, qnrA1–IS15), while others were associated with 2–9 different MGEs. The blaNDM-1–IS15 pattern was detected in multiple samples; blaNDM-1 was also frequently adjacent to ISAba125 and ble (known plasmid-borne pattern in Enterobacteriaceae in Asia). Bayesian phylogeny indicated recent within-farm diversification (MRCA <2 years) and much older divergence between farms (>20 years), suggesting widespread circulation rather than recent inter-farm transmission. Antibiotic use associations: Use of tetracyclines, lincosamides, or aminoglycosides on farms was associated with altered counts of 21 ARGs and significant differences in 20 microbial species. Notably, 20 of the 21 ARGs associated with antibiotic use also correlated with humidity; erm(X) also correlated with temperature. Several species associated with antibiotic use also correlated with humidity (e.g., Wohlfahrtiimonas chitiniclastica, Klebsiella pneumoniae, Proteus hauseri, Proteus mirabilis) or temperature (e.g., Alistipes sp. An66, Lactobacillus aviarius, Enterococcus cecorum, Enorma massiliensis); the Bacillus cereus group correlated with both.
Discussion
The study demonstrates that metagenomic surveillance coupled with ML can capture complex AMR dynamics across livestock production environments that are not resolved by isolate-centric WGS alone. A core set of gut species and ARGs robustly predict E. coli resistance to multiple antibiotics, including clinically relevant determinants without previously known associations for some antibiotics. The identification of non–E. coli taxa (Arcobacter, Acinetobacter, Sphingobacterium) as strong predictors underscores the value of broader microbiome monitoring beyond E. coli for surveillance. Environmental conditions, particularly humidity and temperature within barns lacking effective climate control, are correlated with key AMR-associated species and ARGs, highlighting actionable environmental levers for AMR mitigation in LMIC contexts. The co-localization of ARGs with diverse MGEs, and phylogenetic evidence of long-standing circulation of blaNDM-1 mobile patterns, suggest that mobility and environmental pressures shape farm-specific and broader dissemination patterns. Associations between farm antibiotic use and resistome/microbiome shifts, often intertwined with humidity effects, point to potential co-selection and ARG co-localization dynamics. Together, these findings provide an integrated view linking environment, microbiome, mobility, and phenotypic resistance, informing design of more comprehensive AMR surveillance and control strategies in poultry production.
Conclusion
By integrating large-scale metagenomics, culture-based E. coli AST, machine learning, and environmental monitoring across multiple farms and connected abattoirs in China, the study identifies shared, clinically relevant mobile ARGs between birds and environments and delineates a core gut microbiome–resistome signature predictive of E. coli resistance. It reveals strong correlations between barn humidity/temperature and AMR-linked species and genes, and documents context-specific mobility patterns of key ARGs such as blaNDM-1. The approach offers a scalable framework for AMR surveillance in resource-constrained settings and supports development of environmental and antibiotic stewardship interventions. Future work should expand to additional indicator species (e.g., Enterococcus), incorporate human exposure nodes to map transmission pathways, use long-read sequencing to resolve plasmid structures, and standardize metagenomic methodologies to enable broader adoption and real-time prediction of AMR emergence.
Limitations
The analysis focused on E. coli as the indicator species and did not include human samples, limiting direct inference on human exposure and broader host range. Several antibiotics lacked sufficient phenotypic data for robust ML modelling. Plasmid carriage could not be confirmed due to reliance on short-read metagenomic sequencing. Variability across heterogeneous sources, seasons, and geographies remains, and generalizability beyond the studied settings is uncertain. Environmental sensor data were unavailable for one farm (LN1), reducing coverage for temperature/humidity analyses.
Related Publications
Explore these studies to deepen your understanding of the subject.