Biology
The chosen few—variations in common and rare soil bacteria across biomes
S. Bickel and D. Or
Soil bacterial communities exhibit highly skewed relative abundance distributions in which most species occur at low relative abundance. Rare species are thought to be ecologically important by contributing to functional diversity and specific ecosystem functions, yet their ecological roles and biogeography are poorly understood. While only a small fraction of bacterial species appear broadly prevalent across soils worldwide, the processes governing the vast majority of rare species remain unclear and are often overlooked, particularly in global analyses that emphasize common taxa. Common species tend to be broadly fit across environments and thus are poor indicators of environmental change, whereas rare species may be more sensitive to environmental drivers. There is a need for a universal, objective classification of common versus rare soil bacteria based on both prevalence and relative abundance to better attribute their contributions to ecosystem functioning. This study aims to (i) develop such a global classification metric, (ii) identify how richness and abundance of common and rare taxa vary with environmental conditions—especially climatic water content (CWC), net primary productivity (NPP), and mean annual temperature (MAT)—and (iii) use a mechanistic individual-based model to quantify how climatic factors, particularly soil wetness and associated carbon flux constraints, shape the proportions of rare soil bacteria.
The study builds on evidence that rare species contribute to microbial functional diversity and specific ecosystem functions and may be more sensitive to environmental conditions than common species. Prior global studies have often underrepresented rare taxa and focused on common species’ abundance patterns. Operational classifications of rarity have relied on prevalence or local relative abundance, but standardized global methods are lacking. Statistical models suggest environmental variables explain more bacterial diversity when low-abundance taxa are weighted, implying distinct ecological strategies among rare species. The authors note that common taxa’s broad fitness makes them poor indicators of environmental change and that community composition estimates can be biased by bacterial physiological states (e.g., dormancy), underscoring the need for improved classification approaches.
Data: The authors analyzed a previously compiled global dataset of soil bacterial communities (16S rRNA V4 amplicons) from natural soils across major biomes (n = 844 samples), aggregated to 318 georeferenced sites at 0.1° resolution for environmental analyses. Biome representation ranged from 21–46 samples for most biomes, with tropical grasslands (n = 113), tropical forests (n = 272), and temperate forests (n = 260) overrepresented. Sequencing processing: Reads from three studies were trimmed to 90 bp, dereplicated, and denoised; singletons were removed per sample prior to denoising, yielding 256,620 unique ASVs, 71% of which were observed fewer than ten times across samples. Taxonomy was assigned via a multinomial Naive Bayes classifier trained on Greengenes 13_8 (99% OTUs, 515F–806R). Sequences with <70% confidence at the Bacteria kingdom level or classified as Archaea were discarded; global singletons (observed only once across all samples) were removed. Abundance standardization and metrics: ASV count tables were rarefied to N = 7,544 reads per sample, averaging 15 independent rarefactions for robustness. Prevalence of each species was calculated as the fraction of samples with nonzero rarefied counts. Local relative abundance pik was the proportion of species i in sample k after rarefaction. Global relative abundance gi for each species was the across-sample average of local relative abundances, defining a global RAD distinct from each sample’s RAD. Classification of common vs rare: An automatic threshold selection minimizing cross-entropy (Li’s method; scikit-image function threshold_li) was applied to the global RAD to obtain a data-driven threshold t. Species with gi ≤ t were classified as rare; those with gi > t as common. For each sample, the cumulative relative abundance of rare (RArare,k) and common (RAcommon,k) species was computed by summing pik over species classified as rare or common, respectively. The same classification approach was also applied to an independent soil community RNA time series (not detailed here). Environmental data: For each topsoil sample (≤10 cm), environmental variables were assigned via nearest-neighbor interpolation at their native resolutions: NPP (MODIS, 2000–2015), MAT (WorldClim), and climatic water content (CWC) as a proxy for soil hydration and aqueous-phase connectivity. CWC was derived from MSWEP daily precipitation (1979–2016, 0.1°) using the average number of consecutive dry days τ; rainfall frequency was τ⁻¹. CWC estimates were compared against ERA5-Land mean soil moisture (0–7 cm, monthly, 1981–2019, 0.1°), showing good agreement (n = 318; slope = 0.996; intercept = 0.066; R² = 0.54). Potential carrying capacity (maximal cell density) was estimated from soil carbon input (linked to NPP) divided by a temperature-dependent maintenance rate (10⁻⁴ gC g⁻¹ h⁻¹). Mechanistic model (SIM): A spatially explicit individual-based model simulated microbial growth on a 2D hydrated soil surface representing a 1 mm² area and 11 µm thickness. Three diffusible carbon sources were supplied as pulses on average every 4 hours; consumption rates were bounded by cellular capacity, and all kinetic parameters shared a common temperature dependence. The model initialized 3,360 species with distinct kinetic parameter combinations (one cell per species placed randomly), assuming no initial dispersal limitation. Prescribed generation times under ideal conditions spanned 0.6 hours to 288 days. Simulations ran for 8 days with 1-minute time steps, enabling a maximum cell density of ~10¹⁷ cells m⁻³ soil. Species interactions emerged from spatial positioning and resource fields; no explicit interspecific interaction terms were imposed beyond resource competition and movement on hydrated surfaces. End-of-simulation species counts provided RADs. Simulations were repeated across soil moisture conditions to vary nutrient diffusion and cell mobility. To assess the effect of inactivity/dormancy, analyses were repeated after removing cells that did not divide during the simulation.
- A global, data-driven threshold based on minimum cross-entropy identified only 0.4% of bacterial species as common across soils; 99.6% were classified as rare.
- The common/rare classification threshold for global relative abundance was consistent: 0.019 ± 0.002% (bootstrap mean ± SD). Using fewer samples (e.g., 25% of the dataset) yielded slightly higher thresholds but similar average proportions. Biome-resampled RADs (n = 21, 50 resamples) gave comparable thresholds (0.024 ± 0.003%), indicating limited bias from biome representation.
- Rare species constituted 42% of the cumulative global relative abundance, despite comprising 99.6% of taxa.
- Prevalence strongly differed: median (± IQR) prevalence was 0.3 ± 0.2 for common species versus 0.001 ± 0.003 for rare species (common species ~300× more prevalent).
- The ratio of rare-to-common species richness declined with increasing rainfall frequency (exponential R² = 0.19; Pearson r = −0.41; n = 318), linking community composition to climatic soil water conditions.
- Dry ecosystems hosted diverse, highly variable communities with many rare and endemic species and more even RADs; wetter ecosystems were dominated by a few common species whose composition more closely matched the global-average RAD.
- SIM predictions mirrored observations: under wet conditions, a few common species dominated; under dry conditions with fragmented aqueous habitats and restricted diffusion, communities became more even, favoring rarity. Removing inactive (non-dividing) cells in the SIM sharply reduced the modeled proportion of rare species under very dry conditions, indicating that inactivity can inflate observed rarity in dry soils.
The study’s global classification reveals that a very small fraction of bacterial taxa are common and broadly prevalent, while most taxa are rare and locally endemic. Linking these classes to environmental gradients shows that soil wetness (CWC and rainfall frequency) is a central driver of rarity patterns across biomes. In wetter, well-connected aqueous habitats with higher carbon fluxes, fast-growing, competitively superior common species can realize their advantages and dominate communities. In contrast, drier soils fragment the aqueous phase and restrict nutrient diffusion, suppressing the growth advantages of common species and producing more even communities with higher richness of rare taxa. The mechanistic SIM reproduces these shifts without presupposing species identities, strengthening causal inference that moisture-mediated resource connectivity shapes community structure. The finding that removing inactive cells reduces modeled rarity under very dry conditions highlights the role of physiological state in shaping observed RADs and suggests that high rarity in dry soils partly reflects dormancy or maintenance states. Together, these results address the research questions by providing a universal rarity/commonness metric and demonstrating how climatic factors—particularly soil wetness—govern the balance between common and rare bacteria and, by extension, potential ecosystem functioning and indicator value of taxa.
Main contributions: (i) A universal, data-driven global classification of common versus rare soil bacteria based on cross-entropy thresholding of the global RAD; (ii) empirical demonstration that only ~0.4% of taxa are globally common while rare taxa dominate richness yet contribute less to total abundance; (iii) identification of soil wetness as a key driver of rarity patterns across biomes, with rare taxa more prevalent in drier, resource-limited conditions; and (iv) mechanistic support from a spatially explicit individual-based model showing how moisture and carbon flux constraints shift communities toward evenness and rarity. Future directions: Extend analyses with activity-resolved approaches (e.g., RNA-based, stable isotope probing) to disentangle active versus dormant contributions to rarity; incorporate finer-scale, in situ soil moisture and carbon flux measurements to refine CWC proxies; test the classification across additional biomes and land uses; integrate trait-based data to link rarity/commonness to life-history strategies; and examine temporal dynamics (seasonality and disturbance) to assess stability and transitions between rare and common states.
- Classification relies on short 16S rRNA V4 ASVs (trimmed to 90 bp) and rarefied counts, which may limit taxonomic resolution and detection of very low-abundance taxa.
- The threshold depends on the global RAD and sampling depth; with fewer samples, thresholds increased slightly and fewer rare taxa were captured.
- Environmental moisture metrics are proxy-based; although CWC and ERA5-Land soil moisture agreed reasonably (R² = 0.54), remote sensing and reanalysis data introduce uncertainties.
- Physiological state (dormancy/inactivity) can bias observed RADs; while explored in the SIM, observational data may still overestimate rarity in dry soils.
- Biome representation was uneven, though resampling suggested minimal bias in threshold estimation.
Related Publications
Explore these studies to deepen your understanding of the subject.

