logo
ResearchBunny Logo
Stochastic properties of musical time series

The Arts

Stochastic properties of musical time series

C. Nelias and T. Geisel

Dive into the intriguing world of musical pitch sequences with our research led by Corentin Nelias and Theo Geisel. Using power spectral analysis, we unravel how classical music and jazz improvisation exhibit unique correlation structures. Discover the fascinating findings about power-law decay and musical predictability that challenge common perceptions of rhythm and structure.... show more
Introduction

Music can be viewed as a correlated dynamical process, in which successive pitches, chords, and rhythms create expectations and surprises. Quantifying this balance via information-theoretic measures such as entropies and redundancies is challenging because finite piece lengths lead to combinatorial explosions for larger n-grams, preventing accurate probability estimation for long phrases. Time-series approaches thus shift focus to correlation decay, measurable via the autocorrelation function and, more conveniently, via the power spectral density (PSD) through the Wiener–Khinchin theorem. For pitch sequences, PSDs characterize the stochastic properties of melodic progression over long timescales. Prior reports disagree on long-time behavior (from 1/f to 1/f² or shallower decays), leaving open the central question: what are the long-time autocorrelation properties of pitch sequences, and on which frequency scales and with what exponents do power laws occur? This study addresses these questions using careful multitaper PSD estimates that push frequency resolution to the smallest achievable scales while optimizing bias–variance tradeoffs.

Literature Review

Early work by Voss and Clarke reported 1/f noise in audio recordings, suggesting long-range memory over several orders of magnitude, but their analysis used zero-crossing rates of audio waveforms, making implications for pitch sequences unclear. Boon and Decroly, focusing on pitch sequences, found 1/f² (red noise), while Nettheim reported power-law PSDs with β in [1, 2], tending toward 1/f², albeit limited at low frequencies due to averaging (capturing only intra-phrase scales). DFA studies also diverge: Jafari et al. found exponents implying PSD decays near f^-1/2, whereas González-Espinoza et al. observed multiple scaling regimes and complex profiles, likely influenced by periodic components. These contradictions motivate improved, low-frequency-resolved PSD estimation to resolve whether apparent 1/f behavior persists to the smallest frequencies (implying scale-free correlations) or crosses over to plateaus, implying finite correlation horizons.

Methodology

Data: 553 pitch time series were analyzed: 99 from classical music scores (single movements and full multi-movement compositions) and 454 from improvised jazz solos (Weimar Jazz Database). MIDI files were used to extract monophonic pitch sequences; when necessary, the top voice was selected. Keyboard instruments and strongly polyphonic textures were excluded. Pitch values follow MIDI conventions (0–127). Time series construction: Notes were quantized on a fixed grid of 1/12 quarter-note units to preserve onset/offset timing and enable uniform units across 3/4 and 4/4 meters. Zeros encoded rests. With this grid, the period T (in quarter notes) relates to frequency f (in 1/quarter-note) via T = 1/(12f). PSD estimation: Multitaper spectral estimation was used to minimize spectral leakage and control local bias, with time–bandwidth product NW chosen as NW = 2 (baseline), yielding K = 2NW − 1 = 3 tapers for threefold averaging. Data were mean-centered prior to estimation. Estimates and confidence intervals used spec.mtm (R). Bias–variance analysis: The local bias within bandwidth W (set by the multitaper tapers) can flatten features below W, potentially masking true low-frequency structure; real flat plateaus above W are not bias artifacts. Variance decreases with K; under standard conditions, chi-square-based confidence intervals (2K dof) were computed (95% CI reported). Fitting: PSDs were fitted in log–log coordinates using weighted least squares with multiple candidate piecewise-linear fits; weights account for denser high-frequency sampling. A least-squares criterion selected the best fit, with manual checks for rare misfits. Classification: Spectra were categorized as power law (PL) or power law plus plateau (PL+P), acknowledging that plateaus below W cannot be confirmed. Autocorrelation interpretation used the Wiener–Khinchin relationship: for PSD ∝ f^−β (0 < β < 1), autocorrelations decay as t^{−(1−β)}; a low-frequency plateau implies decorrelation beyond a cutoff time τ_c = 1/f_c.

Key Findings
  • Across the corpus, PSDs of pitch sequences typically follow inverse power laws that cross over to flat plateaus at low frequencies (PL+P). This implies slowly decaying autocorrelations up to a finite cutoff time, then loss of correlation.
  • Cutoff times (τ_c) for PL+P cases typically lie between 4 and 100 quarter notes. Classical examples suggest composer-dependent tendencies: Mozart movements often show plateaus near ~40 quarter notes, while many Bach movements show plateaus near ~10 quarter notes.
  • Length dependence: Longer pieces overwhelmingly exhibit PL+P; all single movements longer than 1200 quarter notes show PL+P. Cutoff period increases with piece length (trend clearer for classical than jazz).
  • Dataset and counts: 553 time series total (99 classical, 454 jazz). PL vs PL+P counts (Table 1): Improvised solos: PL 14, PL+P 440; Single movements: PL 15, PL+P 42; Several movements (multi-movement works): PL 3, PL+P 41.
  • Exponent distribution (classical): β centered around ~1.1, spanning [0.3, 1.8], consistent with 1/f-like behavior over finite ranges. For jazz improvisations, β distribution is broader with a larger mean than for classical.
  • Rhythmic peaks: Distinct PSD peaks at periods corresponding to sixteenth, eighth, quarter, half, and whole notes (0.25, 0.5, 1, 2, 4 quarter notes). In jazz, peaks are broader and less pronounced, reflecting microtiming variability and expressive timing.
  • In short pieces/solos, apparent pure PL without plateau is more common but likely due to large bandwidth W masking the true plateau below W.
  • Simulations confirm that a low-frequency plateau truncates long-range correlations, introducing a finite τ_c, while pure PL yields asymptotically long-range correlations.
Discussion

Carefully controlled multitaper PSD estimates show that pitch sequences generally exhibit slow, power-law-like correlations that persist only up to a finite cutoff time, as indicated by low-frequency plateaus in the PSD. This resolves conflicts in earlier literature by demonstrating that apparent scale-free 1/f behavior does not extend to arbitrarily low frequencies in sufficiently long pieces. The finite correlation horizon balances persistence with innovation, aligning with musical theories of expectation and surprise. The strong length dependence—larger cutoff times in longer pieces—suggests that compositional practices such as thematic development across sections extend correlations. Comparisons with DFA studies indicate that PSD offers clearer identification of periodic (rhythmic) components and more direct detection of low-frequency plateaus than DFA, where periodicities produce broad shoulders. Jazz improvisations, being shorter and exhibiting greater timing variability, show shorter cutoff times and broader rhythmic peaks, reflecting less persistent structure relative to classical compositions. Overall, the findings support ubiquitous finite-range long correlations in music rather than unbounded self-similarity.

Conclusion

This study quantifies long-time correlation properties of musical pitch sequences using multitaper PSD analysis on a large corpus of classical compositions and improvised jazz solos. PSDs typically show power-law decay transitioning to low-frequency plateaus, implying slowly decaying but finite-range autocorrelations whose cutoff times scale with piece length. Classical pieces exhibit β exponents concentrated near 1, while jazz solos display broader exponent distributions and shorter cutoff times with less pronounced rhythmic peaks. These results reconcile prior conflicting reports by revealing that 1/f-like behavior holds only over finite frequency ranges in real musical data. Future work could search for truly self-similar compositions lacking plateaus, expand analyses to other musical dimensions (rhythm, harmony) and multivoice textures, and refine low-frequency resolution to uncover hidden plateaus in short pieces.

Limitations

Low-frequency plateaus cannot be confirmed below the multitaper bandwidth W; in short pieces with large W, true plateaus may be masked. PSD estimates assume wide-sense stationarity and are constrained by finite series length, leading to bias–variance tradeoffs; occasional slight nonzero slopes in plateau regions may reflect variance or insufficiently low frequencies. The corpus prioritizes monophonic lines and the top voice in polyphonic textures, excluding keyboard works and potentially missing multi-voice interactions. Quantization to 1/12 quarter-note units may flatten very high-frequency content (not affecting asymptotics), and live jazz timing variability broadens rhythmic peaks. Some figures (e.g., exponent histograms) use subsets (e.g., 94 classical pieces), and exponent estimates depend on fit choices in piecewise linear modeling.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny