Introduction
The capacity for language hinges on core-Merge, a cognitive function enabling the combination of two linguistic elements (e.g., words) into a sequence perceived as a single unit by the receiver. While initially considered uniquely human, recent research suggests parallels in animal communication. Some bird and mammal species combine calls with individual meanings into sequences that elicit specific behavioral responses, distinct from the reactions to single calls. However, this could simply reflect the temporal proximity of calls rather than genuine combination. To distinguish between these possibilities, a new paradigm was developed, contrasting responses to two-call sequences from a single source vs. two separate sources. This study employed this paradigm using Japanese tits (*Parus minor*), which combine 'alert' and 'recruitment' calls into sequences to mobilize conspecifics against predators. Previous studies indicated that call order is significant, and tits respond differently to 'alert' and 'recruitment' calls alone, presenting a good opportunity to test for core-Merge. This research aimed to resolve the ambiguity between temporal linkage and true core-Merge by comparing the Japanese tit response to two calls from one versus two sources, with the prediction being that if core-Merge is present, the response should differ based on the number of sources.
Literature Review
Existing literature highlights the debate surrounding the evolutionary origins of language's productivity, specifically the role of 'Merge' – a capacity for combining linguistic items – and 'recursion' – enabling the creation of hierarchical structures. Some theories posit that Merge alone is sufficient for language's generative power, while others emphasize the necessity of recursion for complex expressions. Several studies have shown intriguing parallels to Merge in animal communication, where species combine calls into sequences with unique meanings. However, a key distinction is whether these call sequences are perceived as single units (indicating core-Merge) or simply as temporally linked individual calls. This study directly addresses this gap by testing the impact of call source on receiver responses, providing a crucial step in understanding the evolutionary origins and universality of core-Merge.
Methodology
The study was conducted on 64 flocks of wild Japanese tits in Japan's deciduous-coniferous forests. Researchers employed a novel single-sender/multiple-sender paradigm using audio playbacks. Four treatments were implemented: (i) a single speaker playing an alert-recruitment call sequence; (ii) two separate speakers, each playing one call type in the correct sequence; (iii) a single speaker playing a recruitment-alert sequence; and (iv) two speakers playing recruitment and alert calls in reverse order. Each treatment used identical calls and timings (30 calls of each type at a rate of one call every three seconds, with a 0.1-second interval between calls), with only the number of speakers varying. A taxidermied bull-headed shrike, a natural predator of Japanese tits, was placed near the speaker(s) to elicit mobbing behavior. The researchers recorded the percentage of tits approaching within 2 meters of the shrike and exhibiting wing-flicking displays (a common mobbing behavior). To ensure that results weren't influenced by subtle acoustic variations between individuals, the playback stimuli were created from sixteen unique call sets—some using calls from the same bird, others from two different birds—and used across the four treatments. The flock size was also recorded as a covariate. Data was analyzed using generalized linear mixed models to account for the influence of multiple factors, including the source (one or two speakers), call ordering (alert-recruitment or recruitment-alert), and flock size. All stimuli were created using identical methods to control for the influence of sound editing procedures, maintaining constant playback amplitudes of 70 dB at 1.0 m.
Key Findings
The results revealed a significant difference in Japanese tit behavior across treatments (Fig. 4). During single-speaker playbacks of alert-recruitment call sequences, tits exhibited a high percentage of approaches to the shrike and wing-flicking displays, indicating a mobbing response. In contrast, when the same calls were played from two separate speakers, mobbing behavior was significantly reduced. The analysis controlling for flock size and call set variations confirmed that the observed difference was primarily due to the number of sound sources. This finding strongly suggests that Japanese tits perceive the alert-recruitment call sequence from a single speaker as a unified signal, rather than as two separate, temporally proximate calls. Further analyses showed that response was only elicited by naturally ordered alert-recruitment calls from a single source, demonstrating the tits sensitivity to both temporal linkage and spatial unity. The statistical models adjusted for confounding factors such as call set variation and flock size, strengthening the study’s conclusions.
Discussion
These findings provide robust support for core-Merge in non-human animals. Japanese tits demonstrably recognize the two-call sequence as a single unit when originating from a single source but not from multiple sources. This suggests they integrate both the meaning of individual calls ('alert' and 'approach') and the spatial information to understand the combined signal. The results contrast with previous research indicating that some animal communication relies on idiomatic expressions or suffixation. This study provides a strong paradigm for investigating core-Merge across other species, aiding in comparative studies of communication systems across diverse taxa. The results lend support to the theory that language evolution requires both 'Merge' and 'recursion', where core-Merge combines two elements, and recursion allows the creation of hierarchical structures. While tits showed core-Merge, there was no evidence of creating more complex, hierarchical call sequences.
Conclusion
This research conclusively demonstrates the presence of core-Merge in the vocal communication system of Japanese tits, offering direct empirical evidence for a cognitive capacity previously considered uniquely human. The established experimental paradigm provides a valuable tool for investigating core-Merge across diverse species, advancing our understanding of language evolution. Future research should focus on exploring the extent of core-Merge in various animal species and investigating whether hierarchical structures beyond simple call combinations exist within their communication systems.
Limitations
While the study carefully controlled for several potential confounding factors, some limitations remain. The study focuses on a single species, and further research is needed to generalize the findings across other animal taxa. Moreover, the study didn't explore the cognitive mechanisms underlying core-Merge in Japanese tits, leaving open questions about the neural basis of this ability. Finally, the study was limited to a specific type of call sequence and predator stimulus and could be extended to other calls and contexts.
Related Publications
Explore these studies to deepen your understanding of the subject.