logo
ResearchBunny Logo
High-frequency sound components of high-resolution audio are not detected in auditory sensory memory

Psychology

High-frequency sound components of high-resolution audio are not detected in auditory sensory memory

H. Nittono

Discover the intriguing findings of Hiroshi Nittono's research, which reveals that high-frequency sound components typical of high-resolution audio may not be processed distinctly in our auditory cortex. This study sheds light on the elusive perception of audio quality—could the superiority of high-resolution audio be more myth than reality?... show more
Introduction

The study addresses whether the purported advantages of high-resolution audio—specifically, the presence of inaudible high-frequency (>22 kHz) components and potentially sharper temporal transients—are registered at the cortical level. Despite high-resolution audio’s higher sampling rates and bit depths, standard formats already cover the human audible range, and behavioral discrimination between formats is typically near chance. Prior reports (e.g., the “hypersonic effect”) suggest EEG differences when music rich in high-frequency components is reproduced full-range versus high-cut, though conscious discrimination is difficult. The present study tests the hypothesis that if the cortex registers high-resolution characteristics, infrequent high-cut noise bursts embedded among full-range bursts should elicit mismatch negativity (MMN). A comparison condition using an 11-kHz high-cut (removing audible high frequencies) was included where MMN was expected. The goal is to determine whether high-frequency components unique to high-resolution audio and/or reduced temporal smearing are detectable in auditory sensory memory.

Literature Review

Evidence on perceptual benefits of high-resolution audio is mixed; a meta-analysis of 17 studies (1980–2016) found discrimination only slightly above chance (mean 52.3%). Oohashi et al. reported increased EEG alpha power when music rich in high-frequency components was presented full-range versus high-cut (the “hypersonic effect”), with effects sensitive to reproduction method (present with loudspeakers, absent with earphones). Bone-conducted ultrasound can elicit auditory sensations and cortical responses, but high-frequency components are imperceptible when presented alone, leaving mechanisms unclear. Other work implicates subcortical structures (brainstem, thalamus) in responses to high-frequency-rich material and highlights difficulty of conscious perception. Additionally, digital audio processing introduces anti-alias filtering when downsampling, which removes high-frequency content and can cause temporal smearing at onsets/offsets, potentially altering perception. Whether any of these factors produce cortical deviance detection differences remained unknown, motivating MMN-based assessment.

Methodology

Design: Preregistered, double-blind experiment. Participants and ethics: Forty-four university students consented; after exclusions for technical failure/ERP reliability, N=38 (18 men, 20 women; age 19–23, M=21.6) were analyzed. All reported no hearing problems; self-measured high-frequency thresholds ranged 14–19 kHz (M=17,316 Hz). Approved by Osaka University School of Human Sciences ethics committee; conducted per guidelines. Stimuli: Three 50-ms monaural white-noise bursts were created at 192 kHz/24-bit (Adobe Audition): (1) Original (full-range); (2) 22-kHz high-cut (downsample to 44.1 kHz then upsample to 192 kHz; removes >22 kHz); (3) 11-kHz high-cut (downsample to 22.05 kHz then upsample; removes >11 kHz). Both high-cut stimuli inherently included anti-alias filter–induced temporal smearing. Each burst embedded in 500-ms WAV (225 ms silence pre/post). Undithered quantization used. Stimulus files available at https://osf.io/y5qfv/. Apparatus: Playback via foobar2000 on Windows laptop → USB DAC (Meridian Ultra) → preamplifier (Yamaha A-U671) → loudspeakers (Yamaha NS-BP401). High-resolution-grade equipment. Sounds delivered at 62 dB(A). Stimulus onset timing calibrated: 4 ms delay from trigger to acoustic onset at ear level. Procedure (EEG/MMN): Oddball sequences presented with standard p=0.80 (original) and deviant p=0.20 (either 22-kHz high-cut or 11-kHz high-cut), in separate counterbalanced blocks. Constraint: at least two standards before each deviant; stimulus onset asynchrony 500 ms. Each condition comprised 1,000 trials (800 standards, 200 deviants). Participants watched a silent cartoon and ignored sounds. EEG recording and preprocessing: 34 scalp electrodes plus mastoids; nose-tip reference; EOGs recorded. Sampling 1000 Hz; online DC–200 Hz; offline 1–30 Hz filters. Bad electrodes interpolated; ocular artifacts corrected (Gratton-Coles-Donchin method). Trials exceeding ±60 µV rejected. Only standards/deviants preceded by ≥2 standards analyzed; 4-ms trigger-to-sound delay corrected. Reliability screening used ERA Toolbox; final inclusion required ≥150 deviant trials averaged per ERP guidelines, yielding N=38. MMN quantification and statistics: Frontocentral cluster (Fz, FC1, FC2, Cz). Latency window determined by collapsed localizer: deviant–standard grand difference peak between 110–230 ms; mean amplitude measured over 120–160 ms. One-tailed paired t-tests tested MMN presence (difference < 0). Bayesian paired t-tests (JASP) provided Bayes factors. Exploratory point-by-point tests across electrodes used permutation-corrected thresholds (10,000 randomizations; SLORETA accessory) to identify significant deviant–standard differences over time. Behavioral tests: Two ABX comparisons (original vs 11-kHz high-cut; original vs 22-kHz high-cut) with 16 trials each (foo_abx component). Criterion for above-chance performance: ≥12/16 correct (one-tailed p<0.05). Auditory threshold task: participants adjusted to highest audible pure-tone frequency (starting at 10 kHz, +500 Hz steps) via the same playback chain. Analyses included correlations between MMN amplitude and ABX accuracy in each condition, and between MMN amplitude and upper audible threshold.

Key Findings
  • MMN: Significant MMN was elicited for the 11-kHz high-cut deviant but not for the 22-kHz high-cut deviant. • Two-way repeated-measures ANOVA (Condition: 11-kHz vs 22-kHz high-cut) × (Stimulus: deviant vs standard) showed a significant interaction: F(1,37)=40.12, p<0.001, η²=0.52. • 11-kHz high-cut: deviant–standard difference was significantly negative, t(37)=8.28, one-tailed p<0.001, dz=1.34; Bayes factor >3×10^7 supporting MMN elicitation. • 22-kHz high-cut: no significant MMN, t(37)=1.34, one-tailed p=0.094, dz=0.22; Bayes factor=0.718 supporting the null hypothesis (no MMN). • Exploratory whole-scalp analysis: No significant deviant–standard differences at any time point for 22-kHz high-cut; significant differences for 11-kHz high-cut in 80–181 ms, 201–285 ms, and 418–500 ms.
  • Behavioral ABX discrimination: • Original vs 11-kHz high-cut: mean accuracy 99.3% (range 88–100%), clearly above chance. • Original vs 22-kHz high-cut: mean accuracy 52.6% (range 25–81%), not above chance overall. Only 4/38 participants (10.5%) exceeded chance; their mean MMN amplitude (-0.07 µV) was lower than the overall mean (-0.19 µV).
  • Correlations: ABX accuracy and MMN amplitude were not significantly correlated for either condition (rs=-0.07 for 11-kHz; rs=0.05 for 22-kHz; ps>0.10). MMN amplitude was not significantly correlated with upper audible threshold (rs=0.25 for 11-kHz; rs=0.13 for 22-kHz; ps>0.10).
  • Hearing thresholds: Participants’ upper frequency thresholds ranged 14–19 kHz (M=17,316 Hz).
Discussion

Removing only inaudible high-frequency components (>22 kHz) from white noise did not produce MMN or above-chance behavioral discrimination, indicating that auditory sensory memory did not register differences attributable to those high-frequency components or to the associated temporal smearing from anti-alias filtering. This suggests that any advantage of high-resolution audio related to extended bandwidth is not detected at the cortical level and, if present, may occur subcortically and outside conscious awareness. The absence of ERP differences across the entire waveform for the 22-kHz high-cut condition strengthens this conclusion. While MMN can be framed as sensory-memory deviance detection or habituation of afferent processing, it indexes salient change detection in the auditory system; the data indicate that neither inaudible high-frequency energy nor onset/offset smearing at this level was salient to the cortex. Considerations about the digital-to-analog chain imply that high-resolution formats may still offer practical advantages (e.g., more faithful interpolation and reduced susceptibility to DAC variability), but these are distinct from cortical detection of extended bandwidth. The findings do not preclude individual audiophiles with superior discrimination, but for typical listeners, broader playback bandwidth does not confer conscious perceptual benefits under these conditions.

Conclusion

The study demonstrates that high-frequency sound components unique to high-resolution audio (>22 kHz) are not detected in auditory sensory memory and do not support behavioral discrimination, as indexed by absent MMN and chance-level ABX performance when only inaudible high-frequency content was removed. Robust MMN and discrimination occurred only when audible high-frequency components (>11 kHz) were removed. Thus, any perceptual advantages of high-resolution audio are unlikely to stem from extended bandwidth effects at the cortical level and, if present, may involve subcortical mechanisms or system-level playback fidelity. Future work should examine other high-resolution attributes (e.g., quantization depth/bit depth), explore different stimulus types and listening contexts, and further probe subcortical contributions and device-chain interactions.

Limitations
  • The study did not directly compare consumer high-resolution formats with standard formats; all stimuli were created and rendered in a high-resolution chain, differing only in the presence of high-frequency components.
  • Quantization depth (bit depth) was not manipulated; potential effects of increased bit depth remain untested.
  • Stimuli were brief white noise bursts rather than music or complex natural sounds, which may limit generalizability to typical listening scenarios.
  • Participants were young adults with normal hearing; findings may not generalize to other populations or trained audiophiles.
  • Minor deviations from the preregistered protocol occurred (ERP exclusion criterion, onset delay recalibration, intensity remeasurement), though judged non-critical.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny