
Biology
Characterization of Y chromosome diversity in Newfoundland and Labrador: evidence for a structured founding population
H. Zurel, C. Bhérer, et al.
Dive into the intricate genetic mosaic of Newfoundland and Labrador, where 71.4% of the population's Y chromosomes reveal roots tracing back to English and Irish settlers. This study, conducted by a team of experts including Heather Zurel and Claude Bhérer, unveils how historical migration patterns and geographical isolation shaped a unique founder population.
~3 min • Beginner • English
Introduction
Newfoundland and Labrador (NL) traces its modern population to roughly 25,000 European immigrants from the 18th–19th centuries, who established isolated coastal outports with limited inter-community contact until the mid-20th century. The primary ancestral sources were Irish communities (notably around County Waterford) and southern English counties (Cornwall, Devon, and southern English fishing ports). Historical records indicate social stratification by religion—Irish Catholics and English Protestants often attended different schools and rarely intermarried—further isolating communities. Additional European influences include Portuguese, French, and Highland Scottish fishermen and settlers; Norse presence around 1000 A.D. did not lead to permanent settlement. Indigenous peoples were present before, during, and after European settlement. Since the 1900s, migration has been limited, and genetic diversity largely reflects the original European settlers.
Y chromosome studies enable reconstruction of male migration patterns and have produced standardized SNP-defined haplogroup trees (ISOGG). European Y chromosomes are largely E, G, I, J, N, and R, with R predominating. Prior studies, often limited by STRs or low-resolution SNP panels, describe major haplogroup composition but lack fine-scale resolution for NL.
Motivated by prior evidence of founder effects in NL and numerous rare monogenic disorders, the authors aimed to: (1) determine the composition and frequency of paternal haplogroups; (2) elucidate Y-chromosome population structure; (3) compare NL with European ancestral source populations; and (4) identify founder effects via haplogroup expansion and regional clustering.
Literature Review
Background works have characterized NL as an isolated founder population with genetic structuring and elevated rare disorders. Prior Y-chromosome literature established European haplogroup distributions (E, G, I, J, N, R; R predominant) and standardized phylogenies (ISOGG). Earlier European studies, often relying on STRs and lower-resolution SNP panels, provided broad haplogroup frequencies but limited subclade resolution. Autosomal analyses (e.g., Zhai et al.; Gilbert et al.) demonstrated fine-scale structure in NL associated with settlement patterns and religious denominations. Comparative resources used in this study include the Irish DNA Atlas and People of the British Isles (PoBI), as well as gnomAD allele frequencies, to contextualize NL haplogroups against potential English and Irish source populations and other Europeans.
Methodology
Cohort and genotyping: From the first 2,500 participants in the Newfoundland and Labrador Genome Project (NLGP), saliva DNA (Oragene OG-600) was genotyped on the Illumina Global Diversity Array (GDA). Variant calling and QC used Illumina IAAP CLI and GTCtoVCF. Of 2.1M array variants, 5,761 Y-specific SNPs were selected. QC retained 1,110 male samples with <200 missing Y calls (call rate >96.5%), termed NLGP1,110.
Phylogenetic reconstruction: Y haplogroups were assigned using yHaplo and a manual maximum parsimony approach (the latter yielded higher resolution by incorporating SNPs with missing data, singletons, non-ISOGG SNPs, and resolving inconsistencies). Of 5,761 SNPs, 2,114 were phylogenetically informative. Using ISOGG nomenclature, 160 distinct terminal haplogroups were identified; 17 major internal branches and 7 terminal haplogroups were supported by ≥20 informative SNPs.
Ancestry and inclusion criteria: Continental ancestry was assessed via autosomal PCA merged with 1000 Genomes (1KGP3) using PLINK 2.0. Participants reporting recent immigration or paternal ancestors (to great-grandfathers) not from NL were excluded. Few participants reported Indigenous ancestry (4 Indigenous only; 24 mixed European-Indigenous, 2.6%); due to small numbers and limited reference panels, Indigenous Y-DNA contributions were not analyzed. Comparisons to source populations used: (1) overlapping SNPs with PoBI and Irish DNA Atlas (812 overlap; 296 polymorphic) to infer major haplogroup frequencies in 856 Y chromosomes; (2) gnomAD Y-chromosome allele frequencies across seven European populations (Basque, FIN, French, GBR, IBS, Italian, TSI) to identify population-informative rare variants among the 2,114 informative NL SNPs (60 variants present in only one or two of these populations).
Population structure analysis: Autosomal kinship (KING in PLINK2) was used to remove first-degree relatives (kinship 0.177–0.354). Geographic mapping used the birthplace of each participant’s most distant paternal ancestor. NL was partitioned into 5 regions and 15 subregions (Labrador excluded from clustering due to n=4). The main analysis dataset (NL831) included 831 individuals and 133 terminal haplogroups after excluding first-degree relatives, recent immigrants, and those missing geographic data. Religion was self-reported and grouped as Catholic, Protestant, No Religion, Other.
Statistics: Haplotype diversity (H) was computed from terminal haplogroup frequencies for regions and subregions; FST (and linearized FST per Slatkin) and MDS were computed in R (v4.1.0). PCA of haplogroup frequencies by subregion used PCAtools. AMOVA assessed variance components among and within populations and subregions (ade4; 1,000 Monte Carlo simulations). Pairwise composition differences used Fisher’s exact test with simulated p-values (1,000 simulations) and Benjamini–Hochberg correction. Visualization used ggplot2.
Key Findings
- Y-chromosome composition: Among 1,110 NL males, 2,114 informative SNPs defined 160 terminal haplogroups. R haplogroup comprised 74.2% of Y chromosomes, with R1b at 71.4%. R1b-S116 represented 46 distinct haplogroups and 43.2% of NL Y chromosomes. R1b-M222 occurred in 3.1%. Other major haplogroups included I2a, I1a, E1b, R1a, G2a, J2b, J2a (in decreasing order). Seven haplogroups (E1a, H1a, J1a, T1a, O1a, O1b, Q2a) appeared as singletons, mainly in individuals born outside NL.
- Frequent haplogroups: 31 terminal haplogroups were present in ≥10 individuals; seven haplogroups had ≥30 individuals. The most frequent was R1b-DF13 (n=112). R1b-L151 (n=65; 5.9%), R1b-S264 (n=30; 2.7%), R1b-Z8 (n=35; 3.2%), R1b-M222 (n=34; 3.1%), R1b-CTS4466 (n=33; 3.0%), and R1b-Z255 (n=32; 2.9%).
- Regional structure (NL831): R1b-S116 exceeded 42% in all regions except the Northwest (28.9%). R1b-M222 frequency was highest in the Southeast. Major haplogroup frequencies by region (percent): Southeast (R1b-S116 47.5; R1b-M222 8.5), Northeast (R1b-S116 43.8; R1b-M222 3.4), St. John’s (R1b-S116 47.0; R1b-M222 2.2), Northwest (R1b-S116 28.9; R1b-M222 2.5), Southwest (R1b-S116 42.6; R1b-M222 0.0). I2-M438 was elevated in parts of the Northeast and Northwest (e.g., 11.8% in Conception Bay North vs 2.3% in Trinity Bay East). Some adjacent subregions showed significant composition differences.
- Population differentiation: MDS of pairwise linearized FST indicated that 99.4% of total variance aligned with an East–West axis. AMOVA attributed 99.3% of variation to within-population (within subregion) haplogroup distribution (p=0.001). Fisher’s exact tests showed significant differences between Eastern (e.g., Avalon) and Northwestern subregions (p=0.02 to 0.001).
- Religion and haplogroups: Southeast was predominantly Catholic (>70%), whereas northern regions were predominantly Protestant (~70%). R1b-M222 and R1b-Z255 were mainly observed in Catholics, especially in the Southeast/Avalon. I2-M438 and I1a-M253 elevations in the Northwest were primarily in Protestant communities. Notably, Burin East (proximal to Avalon and similar in religious composition) lacked R1b-M222.
- Comparison to source populations: In PoBI and Irish DNA Atlas, 73.3% of Y chromosomes were R1b (R1b-M343). R1b-M222 comprised 23.9% in Ireland vs 1% in England; in NL it ranged 0–8.5% by region (2.6% overall), consistent with Irish contributions, particularly from Catholic-settled areas. Haplogroups R1b-U198, R1b-L46, R1b-Z8—observed at higher frequency in NL—are almost exclusive to England in reference datasets, marking English paternal lines, whereas R1b-M222 and R1b-Z255 point to Irish origins.
- Rare variants and expansion: Of 2,114 informative NL Y-SNPs, 60 appeared in only one or two gnomAD European populations (Basque, FIN, French, GBR, IBS, Italian, TSI), aiding inference of non-British/Irish European inputs (e.g., Basque/Iberian/Italian/French). Several haplogroups showed apparent expansion in NL relative to source datasets: R1b-L46 (14 NL vs 2 in English PoBI; none in Irish data) and R1b-Z8 (52 NL vs 4 in English data across PoBI and gnomAD).
Discussion
The study provides a high-resolution profile of NL patrilineal diversity, demonstrating that most Y chromosomes trace to European, predominantly English and Irish, settlers. The dominance of R1b (especially R1b-S116 subclades) and the distribution of R1b-M222 and R1b-Z255 in Catholic-dominated southeastern regions align with historical settlement of Irish Catholics. Conversely, haplogroups such as R1b-L151 and R1b-Z12, and elevated I1/I2 clades in northern and western regions, correspond to English Protestant settlements, reflecting long-standing social and geographic segregation.
Population structure analyses (FST/MDS, PCA, AMOVA) reveal a pronounced East–West differentiation consistent with coastal settlement patterns, limited in-migration, and inter-community isolation. The identification of rare, population-informative variants and the presence of haplogroups more common in Basque/Iberian, French, and Italian contexts suggest additional European contributions via fishing and early settlement activities. Evidence of local expansion (e.g., R1b-L46, R1b-Z8) supports founder effects, with certain lineages proliferating within isolated communities over ~300 years.
The findings corroborate prior autosomal studies showing fine-scale structure and religiously associated clusters, reinforcing the use of self-reported religion as a surrogate for paternal lineage origins in NL. Collectively, the results address the research aims by detailing haplogroup composition and frequencies, delineating regional and religious substructure, situating NL within its English/Irish source context, and documenting founder-driven expansions.
Conclusion
This work delivers the most detailed characterization to date of Y-chromosome diversity in Newfoundland and Labrador, assigning 1,110 men to 160 terminal haplogroups using 2,114 informative SNPs. The paternal gene pool is dominated by R1b subclades, with clear geographic (East–West) and religious (Catholic–Protestant) clustering that mirrors historical English and Irish settlement and long-standing social isolation. Additional signals from Basque/Iberian, French, Portuguese, and Italian sources are evident, and multiple lineages show signatures of founder-driven expansion.
Future research should incorporate broader European and North American reference populations (e.g., French Canadian and Acadian cohorts) for finer source attribution, leverage complete Y-chromosome sequencing to increase resolution, and integrate historical demographic and sociological data to further interpret religious and regional clustering. Dedicated, community-engaged studies are needed to robustly characterize Indigenous Y-chromosome contributions.
Limitations
- Underrepresentation of Indigenous ancestry: Very few participants reported Indigenous ancestry (or mixed European–Indigenous), and suitable Eastern North American Indigenous Y reference panels are lacking; Indigenous Y-DNA contributions were not analyzed.
- Sampling and exclusions: Recent immigrants and individuals without NL paternal ancestry (to great-grandfathers) were excluded; Labrador was excluded from clustering analyses (n=4). Some subregions (Notre Dame Bay West, Northern Peninsula, West Coast) had small sample sizes (<25), limiting interpretability.
- Array-based resolution: Although high-density for Y-SNPs (5,761), array genotyping may miss variants captured by full Y sequencing; some phylogenetic positions rely on SNPs with missing data or non-ISOGG designations (resolved via manual parsimony).
- Reference data constraints: Overlap with PoBI/Irish DNA Atlas was limited to 296 polymorphic SNPs for comparison of major haplogroups; gnomAD Y-chromosome samples with detailed population labels are limited, constraining inference of rare, population-specific variants.
- Proxy measures: Self-reported religion is used as a surrogate for ancestral origin and may change over time; ancestry inference may be affected by migration to St. John’s and recent internal movements.
Related Publications
Explore these studies to deepen your understanding of the subject.