logo
ResearchBunny Logo
Ecological insights into soil health according to the genomic traits and environment-wide associations of bacteria in agricultural soils

Environmental Studies and Forestry

Ecological insights into soil health according to the genomic traits and environment-wide associations of bacteria in agricultural soils

R. C. Wilhelm, J. P. Amsili, et al.

This study, conducted by Roland C. Wilhelm and his colleagues, explores how bacterial bioindicators can reveal soil health across North America. By examining correlations between these bioindicators and various soil properties, the research unveils significant relationships that enhance our understanding of soil microbiomes and their role in sustainable soil management.

00:00
00:00
~3 min • Beginner • English
Introduction
Managing soil health promotes the long-term fertility and ecological integrity of agricultural lands. Soil health encompasses a range of soil properties that contribute value to agroecosystems, including nutrient and water cycling, biodiversity, plant pathogen suppression, and pollution mitigation. Soil health is monitored using biological, physical, and chemical indicators that correspond with these functions. Ideally, indicators should be directly linked to soil function, interpretable, and exhibit a dynamic response to management practices. The soil microbiome has considerable potential to serve in this capacity. Microbial communities are highly sensitive to management practices, including those that shape properties that determine soil health in agricultural systems. The broad ecological and functional diversity of bacteria in soil provides rich information about soil conditions, which was recently used to predict soil health status. However, our ability to interpret the responses of bacterial ‘bioindicators’ is limited by our sparse understanding of the ecology and function of most bacteria in soil. Bridging this gap between soil microbial ecology and soil health will improve the use of microbiome data in soil health monitoring. Ecological insight into soil microbiome structure and function can be derived by leveraging the large amounts of DNA sequencing data available in public repositories. One form of ecological inference can be derived from genomic data, whereby microbial traits can be estimated from representative genomes that are close relatives of taxa observed in phylogenetic gene marker surveys. Genomic traits, such as genome size, codon usage bias, and rrn copy number, can be used to derive ecological information from trends in soil microbiome composition based on the evolutionary tradeoffs between growth, survival, and reproduction shaping these traits. Genomic traits form the basis of several life-history frameworks that group bacteria by ecological strategies (e.g., ‘generalist’ vs. ‘specialist’), adaptive tradeoffs between growth rate, yield, and stress tolerance, or metabolic dependency (e.g., ‘prototrophic’ vs. ‘auxotrophic’). These frameworks have been used to interpret microbiome trends associated with agricultural management practices, such as tillage intensity and nutrient management. While promising, the genomic inference of ecological traits has notable limitations. Many of the most active and abundant microorganisms in agricultural soils lack representative genomes from which traits might be predicted. Ecological information can still be derived for these non-cultivated organisms by profiling their phylogenetic gene markers across public amplicon sequencing projects. An ‘environment-wide association survey’ (EWAS) approach follows the principle of reverse ecology, where information is inferred from changes in the abundance and distribution of genes across sites—in this case the 16S rRNA marker across environmental conditions. Traditional approaches assign a trait using curated databases, which tend to exclude uncultured or poorly characterized taxa, problematic since unclassified taxa are often indicative of soil properties relevant to soil health management. In contrast, EWAS requires no prior knowledge, given the capacity to obtain information for any organism with a phylogenetic gene marker present in sequencing databases. An EWAS approach is limited by poor metadata quality for many sequencing projects and lack of standardization in workflows; these drawbacks are partially compensated by the volume of projects and efforts to systematize data publishing. This study identified and characterized bacterial bioindicators of soil properties used in soil health assessment using a large amplicon sequencing survey of farmland across North America. Objectives: (i) use 16S rRNA gene sequencing to identify bioindicators correlating with twelve biological, physical, and chemical soil properties used in soil health assessment, emphasizing specific indicator taxa; (ii) evaluate trends in bioindicators using inferred genomic traits and a 16S rRNA gene-based EWAS to understand the ecological basis for their associations with soil health. For genomic traits, the study tested whether community-weighted genomic traits corresponded to variation in soil health ratings. For EWAS, it explored associations of key bioindicators using a database of agricultural microbiomes (89 studies) with metadata grouped by management practice, disturbance, and plant associations.
Literature Review
Methodology
Primary dataset: 778 soil samples from farmland across the USA (191 unique locations) differing in management and soil health ratings, collected under the Comprehensive Assessment of Soil Health (CASH) framework. CASH ratings included biological (soil organic matter, respiration, ACE protein, active carbon), chemical (pH, phosphorus, potassium, minor elements), and physical (aggregate stability, available water capacity, soil texture, surface and sub-surface hardness). Ratings were normalized via scoring functions accounting for texture; total health score was the unweighted mean of all twelve ratings. Tillage was coded for most soils (n=599) as ‘till’ vs ‘no till’. Surface and sub-surface hardness ratings were inverted so higher rating indicated greater compaction (subset n=309, 292). DNA extraction and sequencing: Total DNA extracted with DNeasy PowerSoil, quantified via PicoGreen. Bacterial community composition assessed by Illumina MiSeq 2×250 sequencing of 16S rRNA V4 region (515f/806r). Demultiplexing, filtering/trimming (5 bp on each end), and chimera removal in QIIME2 v2020.2; ASVs inferred with DADA2; taxonomy assigned using SILVA nr_v132. Raw data archived at NCBI (BioProject PRJEB35975). Bioindicator identification: OTUs (ASVs) occurring in at least 10 samples and with ≥0.01% of average read depth retained; counts normalized per thousand reads. Spearman rank correlations between OTU relative abundance and each of the twelve ratings and total score computed (R Hmisc::rcorr). P-values adjusted via Benjamini–Hochberg FDR; weak correlations removed (|r|<0.3 or Padj>0.05). Indicator species analysis for tillage (till vs no-till) performed using indicspecies::multipatt. Scripts provided in Supplementary Data. Genomic trait inference: Genomic traits downloaded from IMG-ER (March 15, 2020) for isolate (n≈68,600), single-cell (n≈3,400), and MAG (n≈8,800) genomes. Traits: genome size, coding density (coding bp/genome size), rrn operon copy number, CRISPR arrays, biosynthetic gene clusters (BGCs). Gene abundances normalized by genome size. OTUs assigned trait values iteratively based on lowest available taxonomic classification, averaging traits at higher ranks if needed. Majority of OTUs assigned a value (94%; 20,148/21,463), with 58% at lowest classified rank. Community-weighted mean trait values computed using relative abundance weights. rrn copy number also computed using rrnDB v5.6 for validation. EWAS: Compiled AgroEcoDB of 89 16S rRNA V4 studies totaling 14,780 libraries from SRA, filtered from 729 BioProjects using criteria: V4 overlap, agricultural-type manipulations, ≥15 samples with well-curated metadata. Taxonomic IDs included soil, compost, decomposition, fertilizer, manure, rhizosphere, and wood decay metagenomes. Sequences processed identically to primary dataset; OTUs matched by exact ASV sequence and length. Indicator species analyses across study factors produced EWAS indicators (Padj<0.05). Factors grouped by management (e.g., fertilizer type, rotation), disturbance (e.g., tillage, drought), plant association (bulk vs rhizosphere), biome (grassland vs cropland), and others (depth, decomposition). Indicator values signed by presumed association with soil health and averaged per OTU; community-weighted averages computed for categories and sub-categories. Statistics: R v4.0.3; packages reshape2, ggplot2, plyr, phyloseq. PERMANOVA on Bray–Curtis with vegan (999 permutations; 50 permutations of factor order to average R). Relative importance of community-weighted traits vs EWAS for explaining composition and total health score assessed with relaimpo. Co-occurrence networks at genus level built from shared bioindicator status across ratings; edges weighted by shared OTUs; positive and negative indicator networks visualized in Gephi (Yifan Hu force-directed layout).
Key Findings
- Relationships among soil health ratings: Biological ratings were interrelated and positively correlated with total health and aggregate stability. Total health was negatively correlated with surface and sub-surface hardness and sand content. DNA yield was positively correlated with total health (r=0.51, p<0.001) but influenced by clay content. - Bioindicators: Of 21,463 OTUs, 1,874 (8.7%) correlated with one or more soil health ratings (mean 1.5 ratings per OTU; max 5). These spanned 348 genera, with 62% candidate or unclassified taxa. Approximately 1.9-fold enrichment of unclassified/candidate genera among bioindicators (215/348) vs overall dataset (430/943). Most bioindicators correlated consistently in direction across ratings (96%). Positive correlations predominated for biological ratings; negative correlations predominated for physical/chemical ratings. - Key taxa: Positive indicators of high biological status included Candidatus Udaeobacter (Verrucomicrobia), Illumatobacteraceae (Actinobacteria), and unclassified Chloroflexi (order KD4-96), Alphaproteobacteria (Xanthobacteraceae), and Actinobacteria (MB-A2-108). Negative indicators (low physical, chemical, total health) included Sphingomonas, unclassified Chloroflexi (JG30-KF-CM45), Ca. Nitrososphaeraceae (Archaea), and Acidobacteria RB41. Tillage indicators: Many abundant taxa were enriched by tillage (NOTU=292 tilled vs 18 no-till). Tillage-favored groups included Sphingomonadaceae, Rhizobiaceae, Caulobacteraceae, Pyrinomonadaceae, Chthoniobacter, and Terrabacter; untilled indicators overlapped with high biological health taxa plus Gaiella and unclassified Solirubrobacterales. - Genomic traits: Community-weighted genome size, CRISPR array frequency, and BGC number were negatively correlated with total health score. Coding density was positively correlated with total health and OM quality (active C, ACE protein) and negatively with DNA yield. CRISPR frequency showed strong negative correlations with water capacity and OM; positive with sand content (r=0.44, p<0.001). rrn copy number showed no correlation with total health (r≈0.003) but was higher in tilled vs untilled soils (p<0.001) and correlated with surface/sub-surface hardness. Community-weighted genome size and BGCs were higher in tilled soils. Genome size primarily correlated with biological ratings; rrn copy number correlated with physical/chemical ratings. - EWAS coverage and impact: Majority of soil health OTUs were present in AgroEcoDB (17,818/21,573; 96.9% of sequences). 8,760 shared OTUs were significant EWAS indicators for one or more study factors. Community-weighted EWAS explained more variation in bacterial community composition than genomic traits, but community-weighted genome size explained the most variation in total health score among all predictors. - Active carbon and genome size: Among bioindicators for active carbon, taxa with larger genomes (e.g., Chthoniobacter 7.8 Mb; Geodermatophilaceae 4.8 Mb; Sphingomonas 4.2 Mb) were relatively more abundant in low active C soils and associated with tilled soils; taxa with smaller genomes (e.g., Gaiella 1.5 Mb; KD4-96 2.3 Mb; Ca. Udaeobacter 2.7 Mb) were more abundant in high active C soils and negatively associated with tillage. EWAS linked these taxa to bulk soil rather than rhizosphere and associated larger genomes with higher disturbance (e.g., tillage, altered watering regimes). - Notable nuances: Chthoniobacter showed larger genomes and tillage association but also indicated high biological ratings, suggesting intra-genus ecological differentiation or niche complexity. Nitrososphaeraceae were strong indicators of poor health and linked to fertilizer use in EWAS. Overall, community-weighted genome size was the best predictor of total soil health rating, and tillage tended to select for communities with larger genomes and higher rrn copy numbers.
Discussion
The study addressed whether specific bacterial taxa and their inferred genomic traits, as well as environment-wide associations, can elucidate ecological drivers behind microbiome–soil health relationships. It found that bioindicators positively associated with biological soil properties (e.g., organic matter quantity/quality) tended to have smaller genomes and higher coding density, while communities in soils with lower health scores, especially under physical/chemical stress and tillage disturbance, were enriched in taxa with larger genomes and higher rrn copy number. These patterns support trait-based life history interpretations: higher rrn copy numbers are associated with fast growth potential favored by disturbance, and larger genomes with metabolic versatility that may confer advantage under degraded or fluctuating conditions. Conversely, smaller, denser genomes (e.g., Ca. Udaeobacter, KD4-96, Gaiella) aligned with higher OM and active carbon, consistent with efficient, specialized strategies in less disturbed, resource-rich soils. EWAS analyses contextualized these indicators across many agroecosystems, linking key taxa to management practices, plant association, and disturbance regimes. Community-weighted EWAS metrics accounted for more community compositional variation than genomic traits, reflecting the power of aggregated environmental associations across studies. However, when predicting an integrative soil health metric, community-weighted genome size explained the most variance, suggesting a strong, conserved genomic signature of soil health status. The findings advance interpretability of microbiome-based soil health indicators by tying taxa to ecological strategies and management impacts. They also highlight complexities: relative abundance changes reflect relative fitness, not absolute abundance; some taxa (e.g., Chthoniobacter) may exhibit diverse ecological niches; and trait–environment relationships (e.g., coding density, CRISPR arrays) can diverge from simple expectations due to texture-driven dynamics, predator–prey cycles, or phylogenetic constraints. These insights underscore the tight coupling between microbiome composition, disturbance history (notably tillage), and soil function proxies, and suggest that promoting conditions favoring small-genome, high-coding-density taxa may align with improved soil health outcomes.
Conclusion
Genomic trait inference and EWAS together provided ecological insight into bacterial bioindicators relevant to soil health assessment across North American agricultural soils. Community-weighted genome size emerged as the strongest predictor of total soil health rating and was linked to tillage, active carbon, and other biological ratings. Tillage favored microbiomes with larger genome size and higher rrn copy number, whereas higher soil health ratings were associated with smaller genomes and higher coding density. EWAS connected prominent bioindicators to disturbance and management gradients, enhancing interpretability beyond taxonomy alone, including for unclassified taxa. Future work should validate these inferences with shotgun metagenomics, resolve within-genus ecological heterogeneity, and test whether indicator taxa are merely responsive to or also drivers of soil function. A priority is elucidating the mechanistic links among genome size, carbon use efficiency, carbon cycling, and soil health, including potential feedbacks whereby low-health soils select for communities that promote carbon loss.
Limitations
- Genomic trait inference relied on reference genomes; many abundant soil taxa lack representative genomes, necessitating assignment at higher taxonomic ranks and limiting phylogenetic resolution. - EWAS depends on the quality and standardization of metadata and sequencing workflows across studies; despite filtering, heterogeneity may bias associations. - Relative abundance metrics do not capture absolute abundances; observed increases reflect relative fitness and may derive from differential survival as well as growth. - Trait proxies have constraints: CRISPR array frequency does not capture spacer content or total defense investment; coding density’s ecological interpretation can be confounded by multiple evolutionary processes. - Contradictory patterns within genera (e.g., Chthoniobacter) suggest niche differentiation not resolved by 16S markers. - Findings are correlative; causal roles of bioindicator taxa in driving soil functions remain to be established; authors note the need for shotgun metagenomic confirmation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny