logo
Loading...
Resilience of genetic diversity in forest trees over the Quaternary

Environmental Studies and Forestry

Resilience of genetic diversity in forest trees over the Quaternary

P. Milesi, C. Kastally, et al.

This study reveals that seven European forest tree species have effectively adapted and maintained genetic diversity throughout multiple glacial cycles, showcasing their impressive evolutionary potential despite significant environmental changes. This research, conducted by a talented group of authors, sheds light on the intricate dynamics of forest trees in response to climatic challenges.... show more
Introduction

Northern temperate and boreal European tree species have persisted for millions of years through multiple glacial cycles, showing large historical range shifts and fluctuations in census sizes. Despite this, most tree species today harbor high genetic diversity and can respond rapidly to recent environmental changes, raising questions about how past demographic changes affected effective population size (N_e), the primary determinant of genetic diversity and selection efficacy. The study asks whether N_e fluctuated strongly or remained relatively stable across repeated glacial cycles, and whether changes in N_e were driven predominantly by climatic events (predicting synchronous changes across species) or by species’ intrinsic biological characteristics (predicting species-specific or grouped patterns). Prior phylogeographic work relied heavily on organellar markers and a focus on the Last Glacial Maximum, limiting inference about deeper time scales and genome-wide nuclear diversity. Here, the authors implement a standardized, cross-species genomic sampling and analysis framework across seven widespread European tree species to reconstruct demographic histories across multiple glacial cycles and test for synchronous changes in N_e among species.

Literature Review

Earlier European tree demographic inferences emphasized organellar markers, which have smaller effective population sizes and limited time depth, and were often interpreted in the context of the most recent glacial period (LGM). Organellar markers behave as single loci and are typically maternally inherited with seed dispersal, limiting their relevance for nuclear genetic diversity where most variation resides. Genome resequencing and coalescent methods extended demographic inferences to millions of years but most studies examined single species or population divergence and gene flow. Mixed results regarding alignment between population history and glacial oscillations across species hindered general conclusions due to heterogeneous sampling designs and genomic targets. Comparative studies in other taxa (e.g., Juglans, African rainforest trees) indicate idiosyncratic, asynchronous N_e trajectories among species, suggesting intrinsic traits modulate demographic responses to climatic fluctuations.

Methodology

Study design and sampling: Seven wind-pollinated European tree species spanning boreal to Mediterranean regions were analyzed: Picea abies, Pinus pinaster, Pinus sylvestris (conifers), and Betula pendula, Fagus sylvatica, Populus nigra, Quercus petraea (angiosperms). A total of 3,407 adult trees from 164 populations (19–26 locations per species; ~25 individuals per population) were sampled across natural ranges under the EU H2020 GenTree project. Targeted sequencing: Approximately 3 Mbp per species were captured using ~10,000 species-specific probes targeting largely orthologous nuclear genes across species. Targets included orthologs of 2,639 Arabidopsis thaliana genes associated with functions of interest (GO/KEGG term-based), species-specific candidates, and randomly selected genes. Orthogroups were identified via reciprocal best hits and OrthoFinder; best orthogroups covering at least six species were prioritized. Roche SeqCap EZ HyperPlus libraries were sequenced (HiSeq 2500 or NovaSeq 6000, paired-end). Read processing and variant calling: Reads were adapter/quality trimmed (ERNE, Cutadapt), mapped to species or close-relative reference genomes with BWA-MEM; organellar reads and duplicates were removed. SNP calling used GATK v4 HaplotypeCaller (GVCF), GenomicsDBImport, and GenotypeGVCFs. SNP-level filters followed GATK recommendations (e.g., QD<0.25, QUAL<20, SOR>3.0, MQ<30, MQRankSum<-12.5, ReadPosRankSum<-8.0). Putative paralog-derived SNPs were removed via HDplot (heterozygote excess and read ratio deviations) and by excluding regions enriched for paralogous SNPs. Genotype-level filters set DP<8 or GQ<20 to missing; SNPs with >50% missing were excluded. Annotation: Sites were annotated into 4-fold, 2–3-fold, 0-fold categories and genomic context (intergenic, intron, UTRs, etc.) using NewAnnotateRef.py and ANNOVAR. Population structure and diversity: Analyses used putatively neutral SNPs (4-fold, intron, intergenic), LD-pruned, excluding singletons. ADMIXTURE (K=1–12) and PCA (EIGENSOFT) characterized structure. Pairwise F_ST (StAMPP) and AMOVA assessed differentiation; isolation-by-distance (IBD) regressed F_ST/(1−F_ST) on log geographic distance. Site-frequency spectra (SFS) and down-sampling: Folded SFS were built from SNP set v5.3.2, removing SNPs with >50% missing in any population; SFS were down-sampled to half the initial sample size to standardize across loci with missingness. Analyses were run at multiple hierarchical levels: species-wide pooled, one-sample-per-population (to emphasize the collecting phase), and per population. Demographic inference: N_e trajectories were inferred with Stairway Plot 2 (model-flexible composite likelihood) using folded SFS of 4-fold, intergenic, intronic sites. Settings: 67% sites for training, 200 resamplings; breakpoints at n/4, n/2, n/3, n−2. Scaling used mutation rates: angiosperms 7.77×10^-9 per site per generation (generation times: Q. petraea, F. sylvatica 60 yrs; P. nigra, B. pendula 15 yrs); conifers 2.7×10^-8 per site per generation (25-yr generation time). To validate, fastsimcoal2 fitted one-population models (SNM, 2-epoch, 3-epoch) to folded SFS, with model selection by AIC, and parameter CIs via parametric bootstraps. Additional two-population divergence models (with/without migration and with/without pre-divergence size change) were tested using joint SFS of representative southern vs northern non-admixed population pairs. Synchronicity analysis: Kendall correlations among species’ Stairway Plot 2 N_e time series assessed cross-species synchronicity. A randomization test identified time windows with more concurrent decreases in N_e than expected, analyzed within species groups showing highest correlations. Simulations and robustness: Power analyses via fastsimcoal2 simulations tested Stairway Plot 2’s ability to recover cyclic demography under parameters relevant to trees (long generation time, large N_e, small targeted genome). Additional checks compared inferences from mixed vs separate population SFS to assess effects of population structure.

Key Findings
  • Genetic diversity and structure: Nucleotide diversity at 4-fold sites (π_4) ranged from 0.0027 to 0.0072 per bp across species. Genetic differentiation (F_ST) was generally low except in Pinus pinaster (F_ST≈0.13) and Populus nigra (F_ST≈0.16). Isolation-by-distance was significant in most species. Diversity often increased toward higher latitudes in boreal species and decreased northwards in several temperate species; overall patterns did not uniformly follow a south–north gradient.
  • Divergence timing: Best-supported models included migration between clusters. Estimated divergence times between major population clusters in all species largely predated the LGM, spanning ~0.6 Mya to ~17 Mya, indicating that main genetic groups formed over multiple glacial cycles and persisted despite gene flow.
  • Effective population sizes: Historical and current N_e estimates were on the order of tens to hundreds of thousands, substantially below census sizes (N) but consistent with lower N_e/N ratios in species with very large N.
  • SFS and growth signal: All species except Fagus sylvatica showed an SFS excess of rare variants, indicating ancient population growth (onsets ranging from ~0.6 Mya in Pinus sylvestris to ~15 Mya in Quercus petraea). Few populations showed decreasing N_e, typically in range-edge or isolated contexts.
  • Magnitude of change: The largest inferred increase in N_e was in P. sylvestris (~5,000 to ~500,000); F. sylvatica showed a modest two-fold rise (~100,000 to ~200,000).
  • Ne resilience to glacial cycles: Despite massive range contractions during glacial advances, species-wide N_e generally increased or remained stable, suggesting metapopulation connectivity buffered genetic diversity against climatic oscillations.
  • Cross-species synchronicity: Three groups emerged based on N_e trajectories: (1) boreal species Picea abies, Pinus sylvestris, Betula pendula plus riparian Populus nigra; (2) temperate broadleaves Fagus sylvatica and Quercus petraea; (3) Mediterranean Pinus pinaster alone. N_e changes did not align consistently with glacial–interglacial timing, though brief synchronous decreases occurred. Grouping reflected shared ecology/life history rather than phylogeny.
Discussion

The study addressed whether Quaternary climatic cycles drove synchronous fluctuations in effective population size across dominant European trees or whether intrinsic traits governed demographic histories. Findings show that, across seven widespread species, N_e generally increased or remained stable over long timescales, with main divergence events predating the LGM by hundreds of thousands to millions of years. This supports a view of large, interconnected metapopulations where gene flow and long generation times buffer neutral genetic diversity against cyclical range contractions. The lack of universal synchronicity and the emergence of species groups whose N_e trajectories correlate with ecological and biogeographical traits underscore the role of life histories, dispersal, and range configurations in shaping long-term genetic diversity. The results reconcile fossil evidence of large census fluctuations with the persistence of high nuclear genetic diversity and rapid adaptive responses observed in trees, indicating substantial evolutionary potential maintained through repeated climatic oscillations. At the same time, the focus on widely distributed survivors of past extinctions highlights that modern European tree floras are a filtered subset favoring traits like prolific dispersal and competitive ability, which likely contributed to genetic resilience.

Conclusion

Across seven ecologically diverse European forest trees, effective population sizes predominantly increased or remained stable through multiple glacial cycles, and major divergence events predate the LGM. Genetic diversity has been strikingly resilient over millions of years, likely due to large, connected metapopulations, high outcrossing, and efficient gene flow. N_e trajectories cluster by shared ecological and biogeographical properties rather than phylogeny or strictly by climatic cycles, indicating that intrinsic species characteristics modulate demographic responses to environmental change. These insights help explain how tree species preserved evolutionary potential to respond to contemporary climate challenges. Future work should extend comparative demographic analyses to a broader spectrum of species (including less common and more range-restricted taxa), incorporate closely related species complexes to quantify introgression’s role, refine mutation rate and generation time estimates for more precise temporal scaling, and integrate genomic data with paleoecological records to resolve short-term vs long-term dynamics.

Limitations
  • Taxon sampling bias: The seven species are widely distributed, abundant, and ecologically dominant survivors of past extinctions; findings may not generalize to rarer or range-restricted taxa.
  • Admixture handling: Individuals with high admixture were excluded from demographic inference; hybridization can inflate apparent N_e and reflects older events, potentially underrepresented here.
  • Temporal scaling uncertainty: Inferred times and absolute N_e depend on assumed mutation rates and generation times; improved estimates could shift the timeline without altering relative trajectories.
  • SFS-based inference constraints: Long generation times and large N_e reduce power to detect recent cyclical fluctuations; population structure and sampling can confound SFS, though multiple sampling schemes and model checks were applied.
  • Genomic target size: Targeted capture (~3 Mbp) focuses on a subset of the genome (including candidate genes); while largely orthologous and complemented by random genes, this may limit resolution relative to whole-genome data.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny