Linguistics and Languages
Sound-meaning associations allow listeners to infer the meaning of foreign language words
S. Hayakawa and V. Marian
The study addresses whether listeners can infer the meanings of unfamiliar foreign words from phonological form alone, and when such inference is possible. Grounded in work on sound symbolism and linguistic iconicity (e.g., kiki-bouba effects), the paper situates language as not entirely arbitrary: non-arbitrary mappings between sound and meaning may exist across modalities and languages. The authors examine cross-linguistic regularities in form-meaning mapping using natural language words rather than constructed non-words, and test cognitive moderators of this ability, focusing on verbal working memory. The research aims to determine if (1) foreign words sharing meaning show greater phonological overlap across languages than words with opposite meanings, (2) sensitivity to these regularities predicts above-chance semantic inference, and (3) individual differences in verbal working memory enhance extraction of meaning from form. The work is important for understanding the extent of iconicity in natural language, mechanisms of cross-modal integration in language processing, and links between language and broader cognitive functions.
Prior research shows robust cross-modal correspondences between speech sounds and perceptual/semantic attributes: shape (kiki-bouba), size (/i/ small, /a/ big), lightness (voiceless with light), touch (voiced with rough), taste (front vowels with sour, back vowels with sweet), abstractness (front vowels with concrete), and personality (sonorants with conscientiousness, plosives with extraversion). These effects generalize across languages and development. However, much evidence uses non-words, raising questions about generalizability to natural language. Recent large-scale cross-linguistic analyses (e.g., trilled /r/ associated with roughness) indicate systematic form-meaning associations in natural lexicons. Neurocognitive findings implicate cross-modal integration and verbal working memory systems (e.g., superior parietal cortex activation; structural differences in the arcuate/superior longitudinal fasciculus) and show enhanced performance in synesthetes and reduced consistency in dyslexia/autism. Comparative work shows humans, not great apes, detect sound-symbolic congruency, possibly reflecting more developed AF and working memory binding. Together, the literature suggests that iconicity and learned regularities may support mapping between phonology and meaning, but mechanisms and moderators in natural language processing remain underexplored.
Design: Forced-choice antonym identification task adapted from Tsuru & Fries and D’Anselmo et al. Participants heard a pair of foreign antonyms and selected which of two configurations of native-language translations matched their meanings. Participants (Study 1): 134 native monolingual English speakers (mean age 35.61, 46% female), recruited via Prolific, with high English proficiency (≥8/10; M=9.87) and minimal proficiency (0–1/10) in nine target languages. Each participant completed three language blocks: one Japonic-Sino-Tai (Japanese, Mandarin, or Thai), one Slavic (Polish, Russian, or Ukrainian), and one Romance (French, Romanian, or Spanish). Data collected 02/22/2022–03/18/2022. Participants (Replication): 46 native monolingual Spanish speakers (mean age 28.63, 37% female) with high Spanish proficiency (≥8/10; M=9.82) and minimal exposure to Japanese and Polish (0–1/10) and low exposure to English (0–3/10). Each completed Japanese (Japonic-Sino-Tai), Polish (Slavic), and English (Germanic). Data collected 08/01/2023–08/13/2023. Stimuli: 45 English antonym pairs (15 nouns, 15 verbs, 15 adjectives) selected from 60 candidates and validated by native speakers in nine languages: Japanese, Mandarin, Thai; Polish, Russian, Ukrainian; French, Romanian, Spanish. English pairs were matched on lexical frequency, concreteness, and affective dimensions. Foreign translations were recorded via TTS female voices, amplitude-normalized, validated by native speakers. Each foreign antonym pair was presented with a 1 s inter-word pause. Stimuli (translations, IPA, audio) available at https://osf.io/4ez8v/. Procedure: Within each of three counterbalanced blocks (45 trials each), participants heard a foreign antonym pair and selected which of two native-language translation orders matched (e.g., sharp:blunt vs blunt:sharp). Word order within the audio pair was counterbalanced across participants; correct side randomized per trial. Participants could play audio at least once and at most twice; requested to respond within 20s (visible countdown) but no hard limit. Two practice trials preceded the task. After all blocks, participants indicated any recognized foreign words; trials containing recognized translations were excluded (4.44% English speakers; 14.70% Spanish speakers). Participants then completed an auditory digit span (CTOPP; 21 trials) and a LEAP-Q background survey. Measures and computed variables: Accuracy (0/1). Phonetic distance between foreign words and their native translations computed via IPA transcriptions using the Sound-Class-based Alignment (SCA) in LingPy; higher values indicate greater distance. Form-meaning regularity for each word pair calculated as the average phonetic distance to translations with opposite meaning minus the average distance to same-meaning translations across unrelated language groups; positive values indicate greater cross-linguistic phonological similarity for same meanings. Verbal working memory indexed by digit span; participants 1.5 SD below the mean were excluded from WM analyses (12 English; 3 Spanish). Analysis: Generalized linear mixed-effects models (binomial) with lme4 in R. Initial model: fixed effects of Language and Part of Speech (sum-coded) and their interaction; random intercepts for participant and item. Additional models tested effects of Language Group (Japonic-Sino-Tai, Slavic, Romance/Germanic), Part of Speech, z-scored phonetic distance, and interactions; and effects of z-scored form-meaning regularity, z-scored digit span, Language Group, Part of Speech, and interactions. Significance via chi-square tests; parameter estimates with Satterthwaite method (lmerTest); pairwise contrasts via emmeans; diagnostics via DHARMa. Effect sizes reported as odds ratios. All tests two-sided.
- Above-chance inference of meaning: Overall intercept > 0.5 accuracy (z=11.38, p<0.001). Accuracy exceeded chance in each of 9 languages for English speakers: Japanese ~0.55 (z=3.04, p=0.021), Mandarin ~0.55 (z=2.86, p=0.037), Thai ~0.57 (z=4.25, p<0.001), Polish ~0.58 (z=4.38, p<0.001), Russian ~0.56 (z=3.41, p=0.006), Ukrainian ~0.58 (z=4.84, p<0.001), French ~0.79 (z=16.36, p<0.0001), Romanian ~0.74 (z=14.05, p<0.0001), Spanish ~0.81 (z=17.58, p<0.0001).
- Language effects (English speakers): Significant main effect of Language (χ²(8)=668.27, p<0.001). Accuracy higher for Romance (French, Romanian, Spanish) than Slavic or Japonic-Sino-Tai (all p<0.001). Within Romance, French > Romanian (OR=1.34, p=0.017); Spanish > Romanian (OR=0.67, p<0.001; inverse-coded).
- Part of speech: In Romance languages, nouns > adjectives (French OR=0.20, p<0.001; Romanian OR=0.63, p=0.023; Spanish marginal p=0.072). In French, nouns > verbs (OR=2.09, p=0.001) and verbs > adjectives (OR=0.41, p<0.001). No significant POS differences in other language groups.
- Phonetic distance (English speakers): Romance had shorter phonetic distances to English than Slavic and Japonic-Sino-Tai (ps<0.001). Accuracy increased with shorter phonetic distance (χ²(1)=178.98, p<0.001), moderated by Language Group × POS × Distance (χ²(4)=75.47, p<0.001). For Romance, distance facilitated nouns (OR≈0.35, p<0.001) and verbs (OR≈0.40, p<0.001), not adjectives. For Japonic-Sino-Tai, facilitation for nouns (OR≈0.77, p<0.001). For Slavic, facilitation for adjectives (OR≈0.67, p<0.001). Above-chance performance persisted after controlling for distance (χ²(1)=114.31, p<0.001) across all groups (Japonic-Sino-Tai ~0.57; Slavic ~0.58; Romance ~0.74).
- Cross-linguistic form-meaning regularity: Phonetic distances among same-meaning translations were shorter than among opposite-meaning translations (χ²(1)=10.94, p<0.001). Regularity scores captured greater phonological similarity for same meanings across unrelated languages.
- Verbal working memory (English speakers): Main effect of WM (χ²(1)=24.80, p<0.001, OR=1.10). WM × Regularity interaction (χ²(1)=5.35, p=0.021): higher WM increased accuracy for high-regularity items (+1 SD; z=5.23, p<0.001, OR=1.15), not for low-regularity items (−1 SD; z=2.00, p=0.088). For high WM, accuracy increased with regularity (z=2.39, p=0.034, OR=1.07); for low WM, no effect. Three-way interaction with POS (χ²(2)=7.28, p=0.026): WM benefits observed for nouns and adjectives across regularity levels; for verbs, only with greater regularity.
- Replication (Spanish speakers): Above-chance accuracy overall (χ²(1)=36.52, p<0.001) and per language (Japanese ~0.55, p=0.031; Polish ~0.59, p<0.001; English ~0.70, p<0.001). Language effect (χ²(2)=68.95, p<0.001): English > Polish > Japanese. POS differences in English (nouns and verbs > adjectives). Phonetic distance from Spanish facilitated accuracy (χ²(1)=50.47, p<0.001), with language- and POS-specific patterns; above-chance performance persisted after controlling for distance (χ²(1)=37.13, p<0.001). Form-meaning regularity predicted accuracy (χ²(1)=70.73, p<0.001, OR=1.41). No significant main effect or interaction of WM (ps≥0.795).
The findings show that listeners can infer meanings of unfamiliar foreign words at above-chance levels using phonological cues alone, indicating that natural languages contain systematic form-meaning regularities beyond arbitrary conventions. Cross-linguistic analyses demonstrated that words sharing meanings exhibit greater phonological similarity across unrelated languages than antonyms, supporting relatively universal mappings between sound and meaning. Behavioral performance depended on both linguistic similarity (phonetic distance to the native language) and cognitive resources. For native English speakers, better verbal working memory enhanced sensitivity to cross-linguistic regularities, particularly for items with stronger regularity and for nouns/adjectives, suggesting that maintaining and binding phonological with semantic-perceptual representations supports extracting meaning from sound. Replication with Spanish speakers confirmed above-chance inference and strong effects of form-meaning regularity and phonetic distance but did not show significant WM effects, possibly due to language-specific processing differences, exposure profiles, or smaller sample/power. Part-of-speech effects paralleled patterns of phonological overlap (e.g., nouns showing greater overlap and higher accuracy), aligning with theories that more concrete referents afford stronger iconic links and facilitating acquisition and recognition. Together, results indicate that sensitivity to form-meaning mapping is a robust phenomenon grounded in both cross-linguistic structure and cognitive mechanisms.
This study demonstrates that sound-meaning mappings in natural languages exhibit cross-linguistic regularities that allow naive listeners to infer meanings of foreign words above chance. Accuracy is modulated by phonological proximity to the native language, cross-linguistic consistency in form-meaning covariation, and (for English speakers) verbal working memory capacity. These findings challenge the notion of purely arbitrary lexicons, linking language structure to general cognitive functions supporting cross-modal integration and working memory. Future research should directly assess perceived iconicity between specific phonetic features and referent properties, incorporate richer measures of language exposure, test larger and more diverse populations, and examine neural mechanisms and developmental trajectories underpinning sensitivity to form-meaning regularities.
- Cross-linguistic regularities may arise from linguistic or communicative constraints and historical relationships, not solely from iconicity; thus, above-chance inference does not by itself prove sound symbolism causes the mappings.
- The study used TTS-generated stimuli; prosodic and voice-specific cues may influence judgments.
- Online samples and self-reported language experience may introduce variability; trials with known words were excluded but residual familiarity cannot be fully ruled out.
- The replication sample was smaller than the English-speaking sample, possibly limiting power to detect verbal working memory effects; Spanish speakers also showed lower digit span scores on average.
- Analyses focus on phonological distance metrics (SCA); alternative distance measures or phoneme-level features might capture different aspects of similarity.
- The study was not preregistered.
Related Publications
Explore these studies to deepen your understanding of the subject.

