Linguistics and Languages

The mental representation of sounds in speech sound disorders

S. Pathi and P. Mondal

Discover how researchers Soujanya Pathi and Prakash Mondal challenge traditional theories of phonological representations to unveil a new understanding of speech sound disorders. Their innovative model connects acoustic information with articulatory systems, providing fresh insights into why these errors occur. Dive into this compelling study!... show more

Introduction

The study addresses why many errors in speech sound disorders (SSDs) cannot be fully explained by defective phonological representations (PR) or by purely motor/structural factors. The authors note that children can sometimes discriminate minimal pairs yet still produce systematic errors, indicating intact PR but impaired production processes. They propose a cognitive model of speech sound production with two core components: a PR that encodes acoustic properties of segments, and an interface module that maps these representations onto articulatory instructions for the articulatory system (AS). The interface, situated between PR and AS, is posited as the locus where mis-mapping or miscalculation can yield characteristic SSD patterns. The paper outlines this model and illustrates its explanatory power using representative SSD data.

Literature Review

SSDs encompass perception, production, and representational deficits, ranging from mild lisps to severe unintelligibility. Prior research links weak or underspecified PRs to SSDs, with children often performing poorly on expressive and receptive measures of phonological representation (e.g., Edwards et al., 1999, 2002; Munson et al., 2005; Sutherland & Gillon, 2005, 2007). However, these accounts struggle to explain cases with intact discrimination and consistent production errors. Traditional views treat PR as a repository of distinctive features (DFs) tied to articulatory properties (Chomsky & Halle, 1968), and dual-route or output lexicon models (Levelt, 1989; Stackhouse & Wells, 1997) assume coarse-grained representations without a detailed mapping stage to articulatory programs. The authors instead adopt Element Theory (Kaye & Harris, 1990; Harris, 1994; Backley, 2011), viewing PR as composed of acoustic elements (|A|, |I|, |U|) that combine to form segments. This acoustic, non-articulatory conception better accommodates cases where perception is intact but production fails. The paper situates the proposed interface within broader interface theories in linguistics (Jackendoff, 2002; Reiss & Volenec, 2018) and contrasts it with articulatory phonology accounts, noting limitations of gesture-overlap explanations for certain SSD substitutions.

Methodology

This is a theoretical and model-building study with illustrative application to secondary clinical data. The authors develop a cognitive architecture comprising: (1) a phonological representation (PR) level encoding segments as combinations of acoustic elements (e.g., |A|, |I|, |U|), which specify spectral properties but no articulatory details; (2) an interface module with multiple levels that transduces PR outputs into articulatory specifications; and (3) an articulatory system (AS) that executes motor instructions. Level 1 of the interface contains a finite set of articulatory feature (AF) "slots" (e.g., voice, manner such as plosive/fricative, place such as alveolar, labiodental) that act as motor schemas. Typical processing: PR generates an underspecified segment x (from element combinations), which maps to appropriate slots at Level 1 (e.g., x + S1 + S3 + S4 → /s/), yielding an articulated sound. The interface is bidirectional and includes higher levels (Level 2: word-level, Level 3: phrase-level, Level 4: sentence-level) that check contextual effects (e.g., coarticulation) and, if necessary, revert to Level 1 to adjust slot selection. The operations are formalized as a sequence: generate x from PR; map x to slot Sn at Level 1; pass to Level 2/3/4 with potential feedback to Level 1 if changes are required; send final output to AS. Atypical processing in SSDs is modeled as mis-mapping of x to an incorrect slot Sn or as transient inactivity of required slots, leading to substitutions or deletions, respectively. The model is illustrated on secondary data from published case reports: Joseph (functional phonological delay; Barlow & Gierut, 2002), Josie (developmental verbal dyspraxia; Bowen, 2015), data from Sutherland & Gillon (2005), and Kirk (Bernthal et al., 2017).

Key Findings

• Phonological representations are best conceived as acoustic element combinations (Element Theory) rather than articulatory distinctive features; articulatory specifications are computed downstream at an interface level. • An interface module with articulatory-feature slots mediates the mapping from PR to AS. Errors in SSDs can arise from: (a) mis-mapping of underspecified PR segments to incorrect slots (yielding substitutions), or (b) transient inactivity of required slots (yielding deletions). • The model explains systematic substitution patterns with minimal contrast changes at Level 1: e.g., /s/→/t/ by selecting plosive (S2) instead of fricative (S3) while keeping voice and alveolar place constant (Joseph: sunny → tanı). It also models widespread stopping, such as /tʃ/→/d/ by selecting [+voice, alveolar, plosive] over [−voice, postalveolar, affricate] (Kirk). • Deletions are accounted for by slot inactivity: e.g., final /p/ deletion in cup (Josie) arises when bilabial plosive and related slots remain inactive for that segment combination, so no articulatory instructions are forwarded to AS. • The multi-level, bidirectional interface accounts for context-induced modifications (coarticulation) at word/phrase/sentence levels and distinguishes typical, transient interactions from entrenched, unnecessary interactions in disordered speech. • The model accommodates cases with intact receptive abilities (good PR/discrimination) but impaired production, supporting a locus of impairment at the interface rather than PR. • Alternative accounts (dual-route, Levelt, Stackhouse & Wells, Dell’s spreading activation, articulatory phonology gesture overlap) either lack granular mapping from abstract sound units to articulatory instructions or cannot explain specific non-overlap substitutions; aerodynamic factors can bias slot selection and help explain preferences (e.g., stopping).

Discussion

The proposed interface-centric account addresses the core question of how SSD errors can occur despite intact phonological representations. By separating acoustic PR from articulatory-feature specification, the model locates many SSD phenomena in the transduction process, explaining patterned substitutions and deletions without positing globally defective PR. This clarifies why children may discriminate minimal pairs but systematically misproduce certain sounds. The framework integrates with broader interface theories and relaxes strict modular encapsulation by allowing higher-level word/phrase/sentence contexts to modulate Level 1 via feedback, while remaining informationally encapsulated relative to syntax/semantics for core operations. Compared with dual-route/Levelt and Stackhouse & Wells models, this approach adds a granular mapping stage from decomposed sound properties to motor schemas, directly addressing how features become articulatory instructions. It also complements and constrains articulatory phonology explanations by highlighting cases where gesture overlap cannot account for non-adjacent substitutions, proposing aerodynamic and mapping-bias mechanisms instead. Clinically, the model motivates differential diagnosis distinguishing intact PR from interface mapping deficits, predicting profiles where phoneme discrimination is preserved but production is impaired; it suggests assessments targeting both PR and mapping integrity.

Conclusion

The study proposes a cognitive model in which phonological representations are acoustic element-based and an interface module maps these representations onto articulatory specifications via feature-slot selection across multiple levels. Miscalculations at any interface level—especially mis-mapping or slot inactivity at Level 1—can yield characteristic SSD errors, including substitutions, deletions, and systematic preferences (e.g., stopping). The model explains cases with preserved perception but impaired production and offers a pathway toward refined SSD subtyping and intervention planning. The authors acknowledge that the model may not capture all segmental error types and requires further empirical validation and extension to additional phonological phenomena. Future research should experimentally test the model’s predictions (e.g., dissociations between discrimination and production), refine mechanisms of slot activation/inactivity, and expand coverage to syllabic and discourse-level processes.

Limitations

• Scope is limited to segmental errors; syllabic and discourse-level errors are not modeled. • Focuses primarily on word-level substitutions; treatment of transpositions and other phenomena is not developed. • The account of deletions is partly speculative (slot inactivity mechanism) and requires empirical investigation. • The model has not been experimentally validated across the full range of SSD profiles and may not suffice for all segmental errors.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A pandemic toll in frail older adults: Higher odds of incident and persistent common mental disorders in the ELSA-Brasil COVID-19 mental health cohort

C. Szlejf, C. K. Suemoto, et al.

Medicine and Health

The value of genomic testing in severe childhood speech disorders

Y. Meng, S. Best, et al.

Psychology

Barriers to 12-month treatment of common anxiety, mood, and substance use disorders in the World Mental Health (WMH) surveys

M. C. Viana, A. E. Kazdin, et al.

Medicine and Health

Effectiveness of virtual reality therapy in the treatment of anxiety disorders in adolescents and adults: a systematic review and meta-analysis of randomized controlled trials

W. Zeng, J. Xu, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny