logo
ResearchBunny Logo
Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals

Psychology

Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals

M. J. Mcpherson, S. E. Dolan, et al.

This fascinating study by Malinda J. McPherson and colleagues explores the influence of universal perceptual mechanisms versus cultural factors on how musical notes are perceived. Findings reveal that both Westerners and native Amazonians exhibit a greater tendency to fuse note combinations based on simple integer ratios, pointing towards a shared perceptual phenomenon, while cultural influences shape aesthetic preferences for consonant intervals.... show more
Introduction

The study investigates whether universal auditory mechanisms related to harmonicity shape the perception of musical intervals across cultures. Harmonicity—integer-multiple relations among frequency components—supports sound source segregation and recognition and is thought to be a universal property of natural sounds. Western music theory links consonant intervals (simple integer ratios) to harmonicity, but cross-cultural preferences for consonance vary and can be absent, raising questions about universality. The authors test the hypothesis that harmonicity-based mechanisms induce a perceptual fusion of consonant intervals independent of cultural exposure, and examine whether such fusion predicts aesthetic consonance preferences. They specifically compare native Amazonians (Tsimane’) with limited exposure to Western harmony to US non-musicians, evaluating fusion (one vs. two sounds) and pleasantness ratings for the same intervals.

Literature Review

Prior work highlights the role of harmonicity in auditory scene analysis, speech segregation, and voice recognition, with neural selectivity for harmonic structure observed in non-human primates. Western music traditions and theories have long associated consonant intervals with the harmonic series, and behavioral and neural evidence in Western listeners links harmonicity to consonance judgments. However, consonance preferences differ across cultures and historical periods and depend on musical experience; some cultures show indifference to consonance/dissonance. The phenomenon of fusion—consonant intervals sounding like a single entity—has been historically noted but rarely measured with modern psychophysics, and prior data were limited to Western listeners and specific methods. The relationship between fusion and consonance is debated, especially given equal temperament deviations from simple ratios that minimally affect consonance but could impact fusion. Cross-cultural tests are scarce, motivating the present study to examine universal vs. culture-specific aspects of interval perception.

Methodology

Design: Cross-cultural psychophysical experiments assessed (1) fusion of concurrent musical intervals (report one vs. two sounds) and (2) pleasantness (consonance) ratings of the same stimuli. Experiments were conducted in-person with US non-musicians (Boston area) and native Amazonians (Tsimane’), plus an online experiment with Western non-musicians.

Participants: In-person: Boston N=28 (14 female; mean age 33.7; no formal musical training). Tsimane’ N=31 analyzed (14 female; mean age 23.4; little musical experience). Additional groups ran specific control tasks: harmonic/inharmonic voice segregation (Boston N=14; Tsimane’ N=21). Online: N=100 Western non-musicians (from an initial 147; filtered by geography, headphone check, and perfect performance on a control task; mean age 38.2; no formal musical training).

Stimuli and tasks:

  • Main fusion and preference tasks (in-person): Two-note intervals presented concurrently. Synthetic tones: harmonic complexes (12 harmonics, −14 dB/oct roll-off, 2 s duration, onset/offset ramps, exponential decay), with intervals: unison (U), major second (M2), major third (M3), perfect fourth (P4), tritone (Tri), perfect fifth (P5), major seventh (M7), octave (Oct), minor ninth (m9). Tuning systems: just intonation and equal temperament (order randomized; differences 0–13.7 cents). Sung notes: male lower note ~200 Hz, female upper note; 9 intervals plus Solo (single male voice); vowels A/E/I/O/U; vibrato standardized; small pitch shifts to match exact target intervals (just intonation). Trials were randomized. Fusion response: “one” vs. “two.” Pleasantness: 4-point like/dislike scale.
  • Control experiments (in-person): (a) One vs. two concurrent Tsimane’ talkers (1 s excerpts). (b) One vs. two concurrent sung vowels resynthesized to be harmonic or inharmonic; intervals chosen to avoid fusion in Westerners (minor third, tritone, minor sixth, minor seventh); harmonicity manipulated by jittering harmonics (excluding f0); ensured ≥30 Hz adjacent component spacing to control beating. (c) Pleasantness controls: recorded laughs vs. gasps; smooth (dichotic) vs. rough (diotic) two-tone complexes across frequency ranges to validate sensitivity to roughness and affective valence.
  • Online experiment (Western non-musicians): Same synthetic-tone construction, equal temperament only, intervals 0–14 semitones (integer steps) across three pitch ranges. Separate randomized blocks for fusion and pleasantness. Included headphone check and one-vs-two talker control.

Procedure: In-person sessions lasted 30–60 min. Fusion and preference tasks run in separate blocks with randomized order; within-block trial orders randomized. Instructions in English (Boston) or Tsimane’ with translators (Tsimane’). No feedback. Experimenters were blind to stimuli. Testing environments used closed-back headphones; Boston sites chosen to approximate field acoustics.

Analysis and statistics: For synthetic-note in-person experiments, results were collapsed across tuning systems due to no detectable tuning effects. Fusion analyzed with non-parametric tests (Wilcoxon signed-rank) and permutation-based ANOVA-like F tests due to non-normality; pleasantness with paired t-tests and mixed-design ANOVAs (Cohen’s d for effect sizes; sphericity tested with Mauchly’s test, no violations). Consonant intervals (M3, P4, P5, Oct) vs. dissonant (M2, Tri, M7, m9) were averaged for main contrasts. For online, consonant set additionally included m3, m6, M6; dissonant included m2, m7, M9 (14 semitones). Individual-differences measures: (i) fusion difference (mean fusion consonant − dissonant), (ii) consonance preference (mean pleasantness consonant − dissonant), with split-half reliability (balanced by register) and Spearman-Brown correction. Correlations were Spearman rank. Figures report SEM or within-participant SEM (sung notes) and 95% CIs (online interval means). Replications of in-person experiments produced similar results across cohorts.

Key Findings
  • Task comprehension and harmonicity cue use: Both groups accurately distinguished one vs. two talkers (Boston: Z=4.87, p<0.0001, d′=4.27; Tsimane’: Z=4.22, p<0.0001, d′=3.23). In the sung-vowel segregation control, both groups performed better with harmonic than inharmonic signals (Boston: Z=3.31, p<0.001; Tsimane’: Z=3.43, p<0.001 for one-voice identification; d′ harmonic > inharmonic, Boston Z=3.31, p<0.001; Tsimane’ Z=3.84, p<0.001). Pleasantness controls showed both groups preferred laughs over gasps and smooth over rough tones (Boston roughness: t(27)=5.21, p<0.0001, d=0.75; Tsimane’ roughness: t(30)=5.59, p<0.0001, d=1.47; Boston vocalizations: t(27)=10.76, p<0.0001, d=2.56; Tsimane’ vocalizations: t(30)=4.14, p<0.001, d=1.07).
  • Cross-cultural fusion of consonant intervals: Consonant intervals were more likely to be judged as one sound than dissonant intervals in both groups and stimulus types. Synthetic notes: Boston Z=4.23, p<0.0001 (Cohen’s d=1.41); Tsimane’ Z=3.27, p=0.001 (d=0.69). Sung notes: Boston Z=2.19, p=0.028 (d=0.20); Tsimane’ Z=2.83, p=0.005 (d=0.47). No interaction between group and interval type across experiments (F(1,57)=3.42, p=0.07). Overall, Tsimane’ fused more (main effect of group: F(1,57)=25.20, p<0.001, ηp²=0.31).
  • No effect of tuning system on fusion: No interaction of tuning (just vs. equal temperament) with interval type (F(1,57)=0.04, p=0.83, ηp²=0.001), and no main effect of tuning (F(1,57)=1.46, p=0.23) in synthetic-note experiments.
  • Pleasantness (consonance) varies by culture: US participants preferred consonant over dissonant intervals (synthetic: t(27)=6.57, p<0.0001, d=1.21; sung: t(27)=4.02, p<0.001, d=0.65). Tsimane’ showed no such preference (synthetic: t(30)=0.58, p=0.57, d=0.10; sung: t(30)=0.13, p=0.90, d=0.02). Significant interaction between interval type and group on pleasantness (F(1,57)=26.48, p<0.001, ηp²=0.32). No tuning interaction with pleasantness in either group.
  • Interval-specific fusion patterns: Fusion peaks at the octave, with fifth and fourth also elevated vs. adjacent dissonant intervals in both groups for synthetic notes (US: Z>3.91, p<0.001; Tsimane’: Z>3.33, p<0.001 for octave and fourth; fifth Z=2.55, p<0.05). For sung notes, octave fused more than neighbors in both groups (Boston Z=2.22, p=0.03; Tsimane’ Z=4.31, p<0.0001). Tsimane’ fusion was similar across sung and synthetic (no interval×experiment interaction, F(8,30)=0.96, p=0.47, ηp²=0.03).
  • Dissociation between fusion and consonance: In Westerners, the most fused intervals (e.g., octave) were not necessarily rated most pleasant; thirds were rated highly pleasant but were less fused. For sung notes, preferences remained for all consonant intervals despite weak or inconsistent fusion, producing significant interactions (synthetic: F(8,216)=10.71, p<0.001, η²=0.28; sung: F(8,216)=12.23, p<0.001, η²=0.31). Tsimane’ preferences did not follow fusion, showing slight preference for larger intervals instead.
  • Online Western cohort (N=100): Replicated fusion peaks (octave, fifth, fourth) and consonance profile. Across intervals, mean fusion correlated with mean pleasantness (Spearman r=0.85, p<0.001) but showed robust dissociations (octave, fifth more fused than their pleasantness would predict). Reliability of interval means was high (r≈0.99 fusion, 0.98 consonance). Individual-differences analysis showed reliable within-measure effects (fusion difference reliability r=0.49, p<0.001; consonance preference reliability r=0.85, p<0.001; Spearman-Brown corrected) but no correlation between individuals’ fusion difference and consonance preference (r=0.01, p=0.93). Results persisted after z-scoring ratings (r=0.12, p=0.23), excluding floor/ceiling in fusion (r=0.08, p=0.47, N=84), and varying interval sets.
Discussion

Findings show that both US non-musicians and Tsimane’ listeners are more likely to perceptually fuse consonant (simple-integer-ratio) intervals than dissonant ones, indicating that harmonicity-based sound segregation mechanisms produce similar perceptual organization across cultures with very different exposure to Western harmony. This supports the hypothesis that universal features of auditory processing induce structure on musical intervals that could influence the evolution of musical systems. However, aesthetic preferences for consonance are not universal: Western listeners prefer consonant intervals while Tsimane’ do not, and, critically, fusion does not predict consonance either across specific intervals or across individuals in Western listeners. Thus, while harmonicity-related perceptual grouping may bias which intervals are perceptually salient (e.g., octave, fifth, fourth), cultural experience and learned associations appear crucial for the pleasantness attached to those intervals. The results suggest that universal perceptual constraints interact with culture-specific factors to shape music perception and practice. They also provide an alternative interpretation for infant studies showing differential attention to consonance, potentially attributable to fusion-based perceptual differences rather than innate affective preferences. The dissociation from pitch-related phenomena in Tsimane’ (e.g., octave equivalence in singing) further indicates that harmonicity-based segregation and pitch representations are partly separable and differentially influenced by experience.

Conclusion

The study demonstrates cross-cultural perceptual fusion of consonant musical intervals, consistent with universal auditory mechanisms tied to harmonicity-based source segregation. Despite this shared perceptual organization, consonance preferences diverge: Western listeners prefer consonant intervals while Tsimane’ do not, and fusion does not predict consonance either across intervals or across individuals. These results imply that universal perceptual biases likely constrain and bias musical systems (e.g., favoring simple-integer ratios like the octave, fifth, and fourth), but cultural exposure and experience determine aesthetic responses. Future research should examine melodic intervals, broader cultural samples, developmental trajectories, and how familiarity and musical exposure shape consonance preferences and their relation to other harmonicity-driven phenomena (e.g., f0-based pitch).

Limitations
  • Cultural scope: Only two groups (US non-musicians and Tsimane’) were tested; generalizability to other cultures remains unverified.
  • Stimulus scope: Focused on concurrent two-note intervals; results may not generalize to melodic intervals or more complex chords beyond what prior work suggests.
  • Operationalization of consonance: Assessed as pleasantness/liking due to translation constraints; consonance is multifaceted, and pleasantness may capture only one dimension.
  • Field conditions: In-person testing occurred in non-soundproof environments; although controls and equipment mitigated noise, residual context differences may influence responses.
  • Exposure variability: Tsimane’ exposure to Western music and modernization varies and was not exhaustively quantified; residual exposure could affect results.
  • Participant characteristics: Excluded trained musicians; findings may differ with musical expertise.
  • Tuning systems: Only just intonation and equal temperament were tested; other tunings were not examined.
  • No feedback and brief tasks: While aiding comparability with pleasantness ratings, absence of training/feedback may limit maximal performance or strategy use.
  • Online data quality: Despite headphone checks and control tasks, uncontrolled listening environments may introduce variance.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny