logo
ResearchBunny Logo
Characterization of Y chromosome diversity in Newfoundland and Labrador: evidence for a structured founding population

Biology

Characterization of Y chromosome diversity in Newfoundland and Labrador: evidence for a structured founding population

H. Zurel, C. Bhérer, et al.

Dive into the intricate genetic mosaic of Newfoundland and Labrador, where 71.4% of the population's Y chromosomes reveal roots tracing back to English and Irish settlers. This study, conducted by a team of experts including Heather Zurel and Claude Bhérer, unveils how historical migration patterns and geographical isolation shaped a unique founder population.

00:00
Playback language: English
Introduction
Newfoundland and Labrador (NL), a Canadian province, boasts a population largely descended from European migrants of the 18th and 19th centuries. Approximately 25,000 immigrants, primarily from Ireland (County Waterford and surrounding areas) and England (Cornwall, Devon, and southern fishing ports), established largely isolated coastal communities. Growth occurred within these isolated communities through large families, with limited intermingling until the 1950s when paved roads improved connectivity. The current population stands at 520,000, shifting from rural to urban centers. English Protestants and Irish Catholics largely remained separated, attending different schools and rarely intermarrying. Additional European influences, including Portuguese, French, and Highland Scottish, also contributed to NL's genetic makeup. While Norse presence existed for over a century around 1000 AD, they didn't establish permanent settlements. Indigenous populations were also present before, during, and after European settlement. Since the 1900s, immigration has been limited, making the province's genetic diversity largely reflective of the initial European settlers. Y chromosome analysis helps illuminate male migration patterns and population origins. While previous studies using STRs or low-resolution SNPs provided information on major haplogroups, this study aims to analyze Y chromosome variation in NL at a higher resolution. This is necessary to address gaps in knowledge about the haplogroup composition, frequency of Y chromosome variation, ancestral origins across NL, and evidence of founder effects based on haplogroup expansion and regional clustering in NL. Previous studies have shown NL to be a founder population based on genetic structure and the prevalence of rare monogenic disorders. However, comprehensive data on Y chromosome variation was lacking, motivating this in-depth study of the NL population's patrilineal ancestry.
Literature Review
Studies of Y chromosome variation have proven instrumental in understanding male migration patterns and the origins of modern human populations. These studies have led to the development of a standardized phylogenetic tree of SNP-defined Y chromosome haplogroups, maintained by the International Society of Genetic Genealogy (ISOGG). The major haplogroups in European Y chromosomes are E, G, I, J, N, and R, with R being the most prevalent. However, many previous studies have been limited by their reliance on short tandem repeats (STRs) and/or low-resolution single nucleotide polymorphism (SNP) panels. Despite these limitations, these studies provide valuable insights into the composition and frequency of major Y chromosome haplogroups within European populations. Studies on the genetic structure of NL, along with the high prevalence of rare monogenic disorders, support its description as a founder population. However, prior knowledge regarding the precise haplogroup composition, frequency of Y chromosome variations, and detailed ancestral origins within NL is limited.
Methodology
This study analyzed data from the Newfoundland and Labrador Genome Project (NLGP), focusing on the initial 2,500 participants. Participants provided self-reported data including religion and ancestral birthplaces, along with saliva samples for DNA extraction. DNA was genotyped using the Illumina Global Diversity Array (GDA). Variant calling and quality control (QC) analysis were performed using Illumina's Array Analysis Platform (IAAP) CLI and GTCtoVCF pipeline. From the 2.1M variants on the GDA, 5,761 SNPs on the male-specific Y chromosome were selected. QC analysis resulted in a final cohort of 1,110 participants (NLGP1110) with a call rate exceeding 96.5%. Phylogenetic reconstruction employed two methods: the yHaplo software package and a manual maximum parsimony approach. The manual approach provided higher resolution, enabling the inclusion of SNPs with missing data, singleton SNPs, SNPs lacking ISOGG designations, and resolution of phylogenetically inconsistent SNPs. 2114 phylogenetically informative SNPs were identified. Participant selection included descendants of early European settlers, excluding recent immigrants and those with unclear paternal ancestry. Individuals with Indigenous ancestry were also largely excluded due to the limited number of participants and lack of an appropriate reference panel. To assess continental ancestry, autosomal data was merged with data from the 1000 Genomes Project, and PCA was performed using PLINK 2.0. Comparison with European source populations involved analyzing Y chromosome data from the Irish DNA Atlas and the People of the British Isles (PoBI), using overlapping SNPs for haplogroup frequency inference. The gnomAD allele frequency database was also queried, focusing on rare variants to identify potential population-specific ancestry. Population structure analysis included kinship coefficient estimation (using KING and PLINK) to remove first-degree relatives. Geographical distribution of haplogroups was mapped using the birthplaces of the most distant paternal ancestors. NL was divided into 5 major regions and 15 subregions for analysis. Haplotype diversity (H) was calculated for each subregion and the overall cohort (NL831, excluding Labrador and first-degree relatives, recent immigrants and individuals with missing ancestral information) Pairwise FST values were computed using R and subjected to multidimensional scaling (MDS) analysis. PCA was performed to assess stratification of paternal lineages across subregions, while AMOVA evaluated variance partitioning. Fisher's exact test with Benjamini-Hochberg correction compared haplogroup composition between regions/subregions.
Key Findings
Analysis of the 1,110 NL Y chromosomes revealed 160 distinct terminal haplogroups. The R haplogroup predominated (74.2%), mainly within R1b (71.4%). R1b-S116 (43.2%) was particularly abundant, including its subclade R1b-M222 (3.1%). Other haplogroups included 12a, I1a, E1b, R1a, G2a, J2b, and J2a. Seven rarer haplogroups were each observed in only one individual and are likely recent immigrants. Thirty-one terminal haplogroups were present in at least 10 individuals, seven were observed in 30 or more. The most prevalent was R1b-DF13 (112 individuals). Analyzing the 831 individuals (NL831 cohort) revealed regional differences in R1b haplogroups, with R1b-S116 exceeding 42% in most regions except the Northwest (28.9%). R1b-M222 frequency was highest in the Southeast. Pairwise FST analysis and MDS indicated an East-West axis of variation, with most variation explained by distribution within subregions (99.3%; p = 0.001). Significant differences in haplogroup composition were seen between the Avalon (East) and Northwest regions. Coastal communities showed distinct haplogroup frequency and religious affiliation patterns. The St. John's metropolitan area showed a mixture of haplogroups from other regions. The Northeast region displayed variations in haplogroups, potentially reflecting varied European origins. Religious affiliation showed a regional distribution, with the Southeast primarily Catholic and the Northwest predominantly Protestant. Some haplogroups showed association with religious affiliation, for example, 12-M438 and I1a-M253 with Protestant communities and R1b-M222 with Catholic communities. Comparison with British and Irish populations showed that the NL Y chromosome profile is more similar to those populations. R1b-U198, R1b-L46, and R1b-Z8 were found to be primarily English in origin, while R1b-M222 and R1b-Z255 were predominantly Irish. Analysis of rare variants in gnomAD provided some evidence of potential contributions from Basque, Portuguese, French and Italian populations, although further study is needed to confirm this. Several haplogroups exhibited signs of expansion within NL, suggesting a founder effect.
Discussion
The high-resolution Y-DNA analysis reveals that NL's paternal lineages are predominantly European, primarily within the R1b haplogroup, consistent with historical records and previous autosomal studies. The presence of other haplogroups (12a, I1a, E1b, R1a, J) aligns with the genetic makeup of other Western European populations, with some suggesting specific regional origins (e.g., R1a's Scandinavian association). The clustering of certain haplogroups within specific regions of NL is linked to historical settlement patterns, notably the concentration of Irish Catholic lineages in the Southeast and English Protestant lineages in the Northwest. This regional clustering also correlates with self-reported religious affiliations. The findings support the hypothesis of a founder effect, where distinct European ancestral communities established settlements in specific regions of NL, expanding over time with limited subsequent migration. The observation of haplogroup expansion and regional clustering reinforces the designation of NL as a founder population. The presence of less frequent haplogroups suggests minor contributions from other European populations (Basque, Portuguese, French, and Italian). This aligns with historical accounts of these groups engaging in fishing activities off the coast of NL. The results align with prior autosomal DNA analyses demonstrating distinct population clusters reflecting the initial Irish and British settlers.
Conclusion
This study provides the most comprehensive analysis to date of the paternal lineages in NL. The findings support the hypothesis that the NL population is primarily composed of descendants from a relatively small number of founders from England and Ireland, who settled in geographically and religiously distinct communities that remained largely isolated for centuries. The observed expansion of specific haplogroups within those communities further substantiates the founder effect. While this study significantly advances our understanding, future research with larger, more diverse reference populations, combined with detailed historical and sociological data, could further refine our comprehension of NL's population history.
Limitations
The study's limitations include the underrepresentation of Indigenous peoples in the cohort and the absence of a dedicated Indigenous reference panel, which hindered a comprehensive assessment of their contribution to the NL Y-DNA landscape. Additionally, the reliance on self-reported religious affiliation as a proxy for ancestral origin has inherent limitations. Furthermore, the limited availability of appropriately annotated Y chromosome data in gnomAD presented challenges in definitively identifying certain population contributions. Finally, the study focused on paternal lineages and may not fully reflect the complexity of the NL population's ancestry and its maternal lineage.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny