logo
ResearchBunny Logo
Experimental evidence for core-Merge in the vocal communication system of a wild passerine

Biology

Experimental evidence for core-Merge in the vocal communication system of a wild passerine

T. N. Suzuki and Y. K. Matsumoto

This fascinating study by Toshitaka N. Suzuki and Yui K. Matsumoto explores whether Japanese tits can comprehend the combination of two calls as a single unit, revealing their cognitive capabilities in antipredator behavior. Discover how these remarkable birds responded differently to calls from one versus two speakers, shedding light on their understanding of communication in the animal kingdom.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates whether a non-human species uses core-Merge—the capacity to combine two items into a single perceived unit—to interpret vocal sequences. Prior work shows Japanese tits produce distinct alert and recruitment calls and often combine them into ordered alert–recruitment sequences during predator mobbing. While receivers respond differently to each call type and to their ordered combination, it has remained unclear whether they perceive the two-call sequence as a single unit from one sender or merely as two temporally proximate calls possibly produced by multiple individuals. The authors propose a single-sender versus multiple-sender paradigm: if core-Merge operates, receivers should respond differently to the same two calls depending on whether they emanate from one spatial source (single individual) versus two spatial sources (two individuals), even when timing is identical. The goal is to test if Japanese tits recognize an alert–recruitment sequence from a single source as a single unit and whether temporal order and spatial co-location jointly determine mobbing responses.
Literature Review
The concept of Merge is posited to underlie language’s generativity, with core-Merge referring to combining two words into a new unit. Although long considered uniquely human, several animal systems show call combinations that elicit responses distinct from component calls alone. In Japanese tits, alert calls (warning of danger) and recruitment calls (attracting conspecifics in non-dangerous contexts) are combined into alert–recruitment sequences during predator mobbing, and prior experiments show sensitivity to call order and compositional decoding. However, an alternative is that receivers respond to temporal linkage alone, regardless of whether calls originate from one or multiple individuals. Beyond tits, other species like putty-nosed and Campbell’s monkeys also produce call combinations with diverse relations between combinations and meaning (e.g., idiomatic-like combinations or suffixation analogies). Despite these observations, evidence that receivers treat combined calls as a single unit from a single sender—indicative of core-Merge—has been lacking.
Methodology
Study site and subjects: 64 flocks of wild Japanese tits (Parus minor) in mixed deciduous–coniferous forests in Nagano and Gumma, Japan (36°17′–31′N, 138°26′–39′E). Trials conducted Oct 26–Dec 4, 2020, between 08:00–16:00 JST, avoiding wet/windy conditions. Flocks tested at least 400 m apart to minimize sampling the same individuals. Experimental design: A predator model (taxidermic bull-headed shrike, Lanius bucephalus) was placed on a branch 5 m from speaker(s). Playback stimuli contrasted spatial source (one vs two speakers) and call order (alert–recruitment, AR, vs recruitment–alert, RA), yielding four treatments: (i) 1A-R: one-speaker AR sequences; (ii) 2A-R: two-speaker AR with alert and recruitment from different speakers but temporally linked; (iii) 1R-A: one-speaker RA (reversed) sequences; (iv) 2R-A: two-speaker RA with calls from different speakers and not linked in space or time. In two-speaker treatments, speakers (SoundLink Micro, Bose) were 10 m apart and arranged in line with the shrike model. Stimuli construction: Using Audacity 2.1.3, four 90-s stimuli per block were created from identical call exemplars: 30 alert calls and 30 recruitment calls per file at one call every 3 s per speaker; within-sequence interval between alert and recruitment set to 0.1 s (within natural range). Between-sequence intervals varied 1.50–1.81 s (median 1.68) but were constant within each block. One-speaker files were mono; two-speaker files stereo with call types assigned to left/right channels. Playback level was 70 dB at 1 m. Calls were sourced from the local population; recruitment notes used 7–10 repetitions (typical for shrike context). Files saved as 16-bit, 48-kHz WAV and played from an iPhone 8. Controls for identity and pseudoreplication: Prepared 16 unique alert–recruitment call sets using calls from the same individual (n=8 sets) or two different individuals (n=8 sets), yielding 16 stimulus blocks. Each block generated all four treatments, totaling 64 playbacks (n=16 flocks per treatment). This allowed testing whether variation in caller identity or number of source individuals affects responses. Behavioral measures: During each 90-s playback with the shrike model, observers recorded (i) the percentage of flock members approaching within 2 m of the shrike, and (ii) the percentage exhibiting wing-flick displays (mobbing). Flock size (number of tits within 15 m during playback) was recorded and included as a covariate. Statistical analysis: Generalized linear mixed models compared treatments for approach and wing-flick percentages, with log-likelihood ratio tests for significance. Factors included treatment, flock size covariate, and number of source individuals (one vs two) for stimulus construction. Sample size: n=16 flocks per treatment (total n=64).
Key Findings
- Tits mobbed the predator model during one-speaker alert–recruitment (1A-R) playbacks but rarely mobbed when the same two calls were played with identical timing from two speakers (2A-R). GLMM pairwise contrasts: approach Z=5.50, P<0.0001; wing flicking Z=5.68, P<0.0001. - Reversing call order (1R-A) from a single source produced significantly weaker responses than natural order (1A-R): approach Z=5.10, P<0.0001; wing flicking Z=4.69, P<0.0001. - Two-speaker RA (2R-A) elicited rare mobbing and differed significantly from 1A-R: approach Z=4.90, P<0.0001; wing flicking Z=5.54, P<0.0001. - Overall treatment effects were significant (Fig. 4): approach χ²=80.16, df=3, P<0.0001; wing flicking χ²=95.75, df=3, P<0.0001. Mobbing occurred when and only when birds perceived naturally ordered AR sequences from a single source. - Flock size positively affected approaching within 2 m (χ²=16.06, df=1, P<0.0001) but not wing-flicking (χ²=0.00, df=1, P=0.9692). - The number of source individuals used to construct stimuli (one vs two) did not affect responses (approach χ²=0.78, df=1, P=0.3777; wing flicking χ²=0.69, df=1, P=0.4046).
Discussion
The results show Japanese tits distinguish between sequences produced by a single source and the same temporally linked calls produced by two sources. Birds mobbed only when hearing naturally ordered alert–recruitment sequences from one speaker, indicating they recognize the two-call sequence as a single unit from one individual rather than merely responding to temporal proximity. Reduced responses to reversed order and to two-speaker presentations demonstrate that both temporal linkage (natural ordering) and spatial co-location (single source) jointly determine the perceived unit. These findings provide evidence for core-Merge-like processing in a non-human species, aligning with prior work that tits extract compositional meanings from combined calls. The work also informs language evolution debates by supporting the view that core-Merge can exist without evidence of recursion; tits combine two meaningful calls but show no evidence of producing sequences with more than two meaningful elements or hierarchical structure. The single-/multiple-sender paradigm offers a robust framework to test core-Merge across taxa.
Conclusion
This study introduces and applies a single- versus multiple-sender playback paradigm to demonstrate that Japanese tits perceive alert–recruitment call sequences from a single source as a single unit, satisfying a key criterion for core-Merge in animal communication. Responses depended on both correct temporal order and a single spatial origin. The findings broaden our understanding of combinatorial signaling in animals and suggest that core-Merge may be more widespread than previously recognized. Future research should test for hierarchical structuring and recursive combinations, assess generality across species, and explore the cognitive and neural mechanisms enabling core-Merge-like processing.
Limitations
The experiments focused on two-call combinations and did not examine whether tits can produce or perceive sequences with more than two meaningful elements or hierarchical structure; the authors note there is no evidence for such complexity and that further research is needed. Most birds were not individually color-ringed, although trials were spaced ≥400 m to minimize resampling. Findings are from a single species and context (predator mobbing with a shrike model and controlled playbacks), and generalization to other species or contexts requires further study.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny