Medicine and Health
Genetic aetiologies for childhood speech disorder: novel pathways co-expressed during brain development
A. Kaspi, M. S. Hildebrand, et al.
Explore the groundbreaking study on childhood apraxia of speech (CAS), where a team led by Antony Kaspi and other prominent researchers identified high-confidence genetic variants linked to this severe speech disorder. The research not only doubles the candidate genes associated with CAS but also underscores their roles in brain development and genetic overlap with other neurodevelopmental disorders.
~3 min • Beginner • English
Introduction
Childhood apraxia of speech (CAS) is a rare neurodevelopmental disorder (~0.1% prevalence) characterized by deficits in speech planning and programming that impair sequencing of sounds, syllables, and prosody. The first gene implicated in CAS without intellectual disability was FOXP2 in 2001. For many years, FOXP2 was the only gene clearly linked to CAS. With advances in genome sequencing, two prior cohort studies (total n=52) identified pathogenic variants in 19/52 probands, implicating 17 additional genes and indicating a substantial monogenic contribution with a combined diagnostic yield of ~37%. These studies suggested key roles for transcriptional regulation pathways in aberrant speech development, showed that de novo variants are common, and highlighted overlap between CAS genes and those implicated in other neurodevelopmental disorders (e.g., epilepsy, ASD, intellectual disability). Given genetic heterogeneity and pleiotropy, larger cohorts are required to discover additional causative genes, increase diagnostic yield, and clarify molecular pathways underlying severe childhood speech disorders. The present study aimed to identify molecular causes in a large cohort of probands ascertained for CAS, and to analyse co-expression and overlap of CAS genes with other neurodevelopmental disorder genes.
Literature Review
Prior to this study, two independent genome-wide sequencing cohorts of probands with CAS reported genetic diagnostic rates of 42% (8/19) and 33% (11/33). Genes implicated included CHD3, SETD1A, WDR5, KAT6A, SETBP1, ZFHX4, TNRC6B, MKL2, CDK13, EBF3, GNAO1, GNB1, DDX3X, MEIS2, POGZ, UPF2, and ZNF142, with SETBP1 recurring across cohorts. These findings supported transcriptional dysregulation as a central pathway, identified other relevant pathways (e.g., G-protein signalling via GNAO1 and GNB1), and underscored de novo mutation burden and genetic heterogeneity in CAS. Many genes overlapped with those known for epilepsy, ASD, and intellectual disability, suggesting shared neurodevelopmental risk mechanisms. Historically, FOXP2 was the primary gene linked to CAS, but more recent discoveries have expanded the set of candidate genes and pathways. These prior results motivated larger, phenotypically well-characterized cohorts to refine gene discovery, evaluate co-expression during brain development, and test broader genetic mechanisms (e.g., STRs, polygenic risk).
Methodology
Ethics: Approval by The Royal Children's Hospital, Melbourne HREC (#37353); written informed consent obtained from parents/guardians.
Participants and phenotyping: Probands under 18 years with a primary clinical diagnosis of CAS were recruited via clinicians or parent referral. Exclusion: moderate to severe intellectual disability by psychometrics. Detailed medical/developmental histories and comorbidities were documented with verification from professional reports. CAS diagnosis was confirmed using ASHA consensus criteria operationalized via standardized single-word subtests (DEAP) and 5-minute conversational speech samples. Dysarthria was identified via oral tone/coordination disturbance and Mayo Clinic Dysarthria rating. Language and cognition were assessed using standardized tools (e.g., CELF-5, WISC, WPPSI, KBIT, WNV, GMDS).
Genetic testing and sequencing: Chromosomal microarray (Illumina; effective resolution ~200 Kb) was performed and analyzed with Karyostudio. Whole genome sequencing (Illumina NovaSeq 6000; ~30× coverage; ~100 Gb/sample) was conducted on 204 individuals from 70 families (71 probands with one MZ twin pair; reported as 70 probands for genetics), including 127 parents and 6 other relatives. Library prep: TruSeq DNA Nano or NovaSeq PE150 PCR-free. Validation: Sanger sequencing or ddPCR as required for segregation/confirmation.
Bioinformatics pipeline: Reads (150 bp paired-end) aligned to hg19 with BWA-MEM; SAMtools for sorting/indexing; GATK (v4.1.4.1) for duplicate marking, BQSR, and HaplotypeCaller. Joint calling by merging per-sample gVCFs and GenotypeGVCFs. Quality filters: excess heterozygosity (Z>4.5), VQSR (SNVs/indels) with 99.7% truth sensitivity; hard filters for QD, FS, SOR, RankSum metrics. Familial relationships confirmed with peddy. Analysis restricted to rare variants (absent from gnomAD or allele count ≤2), absent in unaffected family members, and fitting inheritance models (de novo, dominant, recessive, compound heterozygous with MAF<0.05%). Depth >10 and GQ>20 required.
Variant annotation and prioritization: VEP (v93.3, GRCh37.p13). Predicted loss-of-function (LoF) candidates required LoF consequence, LoF intolerance (ExAC pLI≥0.9 or LoFtool<0.1), and pathogenicity predictions (CADD≥20 or splicing impact via dbscSNV). Predicted damaging missense variants met at least three of: PolyPhen-2 possibly/probably damaging, SIFT deleterious, CADD≥20, significant MTR. A two-stage prioritization was applied: (1) shortlist within a curated list of 2145 genes of interest (from prior CAS/speech genes; high-confidence PanelApp genes for intellectual disability, epilepsy, ASD, cleft palate; SFARI ASD genes; brain-expressed genes overlapping human accelerated regions), followed by ACMG classification and clinical geneticist review; (2) agnostic genome-wide search with ACMG and clinical review if stage 1 yielded nothing. Variants classified as high confidence (ACMG pathogenic/likely pathogenic consistent with phenotype) or low confidence (VUS or inconsistent gene-phenotype despite likely pathogenicity).
Structural variation: CNVs/SVs detected with Manta (up to 5 Mb) and qDNAseq (10 kb bins; up to 5 Mb) and annotated with SVAnnot; filtered by frequency (gnomAD SV>0.05% excluded) and technical artefacts. Candidate SVs assessed with ACMG and clinical review.
Validation: High-confidence variants validated by Sanger sequencing or ddPCR with standard protocols (primer design to reference transcripts; BigDye v3.1; 3730xl Analyzer; QuantaSoft for ddPCR).
Additional analyses: (1) Short tandem repeat expansion analysis for known and novel pathogenic repeats. (2) Polygenic risk scores for ASD and non-syndromic cleft palate to test enrichment in CAS probands. (3) Mitochondrial DNA abundance estimation. (4) Brain gene co-expression using BrainSpan developmental transcriptomics and Monte Carlo sampling to test whether CAS genes are more co-expressed than expected by chance; extension to include prior CAS genes; gene set enrichment (GO/Reactome) using g:Profiler. Co-expression framework also used to prioritize low-confidence genes and genes within known CNV regions associated with speech/language disorders.
Key Findings
– High-confidence variants were identified in 18/70 probands (26% diagnostic yield). Variants spanned 18 genes: ARHGEF9, BRPF1, DDX3X, DIP2C, ERF, HNRNPK, KDM5C, PHF21A, PURA, RBFOX3, SETBP1, SETD1A, SETD1B, SHANK3, SPAST, TAOK2, TRIP12, ZBTB18.
– Of these, three genes (SETBP1, SETD1A, DDX3X) were previously implicated in CAS and were independently confirmed; the remaining 15 represent new gene associations for CAS in this cohort.
– Inheritance: 15/18 variants were de novo; 3 were inherited. Variant types included frameshift (n=3), splice acceptor (n=2), nonsense (n=6), missense (n=6), and one multiexon duplication (TRIP12).
– All 13 truncating/splice/duplication variants occurred in genes intolerant to LoF; all five missense variants met multiple in silico damaging criteria.
– Structural variation: one de novo pathogenic tandem ~59.8 kb duplication in TRIP12 (exons 7–37) predicted to cause LoF.
– Short tandem repeats: no known or novel repeat expansions detected in probands.
– Polygenic risk: ASD PRS showed a trend toward enrichment in probands (two-sample t-test p=0.054); cleft palate PRS showed a non-significant increase (p=0.226).
– Mitochondrial abundance: not a general biomarker for CAS; two probands (with DDX3X and HNRNPK variants) were outliers.
– Co-expression: High-confidence genes were more highly co-expressed during brain development than expected by chance (median |rho|=0.4194; 32/153 pairs in top 5% genome-wide; p=0.0038). A subset of highly co-expressed genes (BRPF1, DIP2C, KDM5C, PHF21A, SETBP1, SETD1A, SETD1B) was enriched for chromatin organization. Extending to 34 genes (including prior CAS studies) reinforced enrichment for chromatin organization and transcriptional regulation.
– Phenotype: Among probands with pathogenic variants, expressive (15/17) and receptive (13/18) language disorders were common; all school-age tested (n=8) had reading and spelling impairments. Gross motor delay (16/18) and fine motor delay (14/18) were frequent; seizures occurred in 2/18. Cognitive profiles ranged from average to mild intellectual disability, with many borderline results.
Discussion
This study nearly doubles the number of genes implicated in CAS and demonstrates that monogenic causes underlie a substantial fraction of severe childhood speech disorders. The high proportion of de novo, likely damaging variants and the spectrum of variant types mirror patterns seen across neurodevelopmental disorders. The genes identified show strong co-expression during brain development and significant enrichment for pathways in chromatin organization and transcriptional regulation, underscoring transcriptional dysregulation and chromatin modification as core mechanisms in speech development. The overlap of CAS genes with those associated with epilepsy, ASD, and intellectual disability highlights pleiotropy and shared neurodevelopmental mechanisms, and suggests that CAS can serve as a sentinel phenotype for underlying single-gene disorders. Phenotypically, probands with identified genetic causes more often exhibited additional language, motor, and cognitive impairments, consistent with a potential threshold effect where monogenic variants are more likely when CAS co-occurs with broader neurodevelopmental features. PRS analyses provided limited evidence for common variant contributions, while STR expansions and mitochondrial abundance did not broadly contribute to CAS, indicating that rare coding/sv variants are key drivers in many cases. The results broaden the phenotypic spectra of multiple neurodevelopmental genes to include specifically defined CAS, argue for precise speech/language phenotyping in genetic studies, and refine gene lists for clinical testing and research.
Conclusion
Whole-genome trio sequencing and comprehensive variant analyses identified pathogenic or likely pathogenic variants in 26% of CAS probands, implicating 15 novel genes and confirming three previously associated genes. Genes involved in CAS are co-expressed during brain development and are enriched for chromatin organization and transcriptional regulation pathways, reinforcing these mechanisms in speech development. The findings emphasize genetic overlap with other neurodevelopmental disorders and the importance of detailed speech/language phenotyping. Future work should include larger, inclusive cohorts (including individuals with comorbid ASD, epilepsy, and intellectual disability), functional studies to validate candidate genes and mechanisms, refinement of variant interpretation frameworks (especially for recessive mechanisms and structural variants), longitudinal genotype–phenotype correlation, and exploration of precision therapy avenues for genetically stratified CAS subgroups.
Limitations
– Cohort size limited statistical power for gene burden tests; reliance on curated gene lists and ACMG criteria may miss novel mechanisms.
– Potential under-detection of recessive contributions due to prioritization metrics (e.g., pLI scores tailored to heterozygous LoF intolerance).
– One structural duplication (TRIP12) predicted as tandem could not be independently confirmed by sequencing beyond short-read data.
– Some parental samples were unavailable in broader cohort design (though most were trios), limiting de novo confirmation for all candidates.
– PRS analyses may be underpowered; STR analyses found no expansions but cannot exclude small effect or rare repeat mechanisms.
– Phenotypic selection excluded moderate to severe intellectual disability, which may bias gene discovery toward certain mechanisms and limit generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.

