logo
ResearchBunny Logo
Exploring the foundations of tonality: statistical cognitive modeling of modes in the history of Western classical music

The Arts

Exploring the foundations of tonality: statistical cognitive modeling of modes in the history of Western classical music

D. Harasim, F. C. Moss, et al.

Dive into the intriguing evolution of tonality in Western classical music through an extensive analysis of 13,000 MIDI-format pieces. This groundbreaking research by Daniel Harasim, Fabian C. Moss, Matthias Ramirez, and Martin Rohrmeier reveals how modes have changed from the Renaissance to the 19th century.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper investigates how tonal organization—specifically musical modes—developed historically in Western classical music. Motivated by longstanding music-theoretical narratives about the emergence, stabilization, and transformation of tonality from the Renaissance through the Romantic eras, the study asks whether the number and characteristics of modes can be inferred directly from musical data without assuming major/minor a priori. It seeks to bridge qualitative-historical and quantitative-empirical perspectives using large-scale corpus analysis and computational modeling, testing assumptions such as octave equivalence, transpositional invariance, and the 12 pitch-class system to evaluate whether distinct modal systems characterize different epochs and how clearly modes separate across time.
Literature Review
Prior empirical research has frequently modeled tonality via pitch-class distributions, aligning corpus findings with cognitive and neuroscientific evidence that listeners internalize statistical regularities of pitch (e.g., Krumhansl; Huron; Janata; Koelsch). Corpus studies using ClassicalArchives and other datasets have shown diachronic changes such as increased use of dominant-sevenths, reduced diatonicity, expanded tonal material, and shifts in chordal directionality (White & Quinn; Yust; Moss; Weiß et al.). Specific work on historical modes in Western music includes Huron & Veltman’s and Cornelissen et al.’s chant-based analyses, and Albrecht et al. (2014), who traced modes 1400–1750 by iteratively clustering pieces against assumed major/minor templates. However, most studies presuppose two modes (major, minor) and fixed templates. This paper addresses these assumptions directly by inferring both the number and shapes of modes in an unsupervised manner across periods.
Methodology
Data: Approximately 21,000 ClassicalArchives MIDI files of Western classical music; 12,625 pieces include composition year. To mitigate pre-Baroque sparsity and vocal repertoire underrepresentation, 777 Renaissance pieces were added from scholarly sources (Lost Voices, CRIM, ELVIS), yielding 13,402 total pieces (over 55 million notes). Pitch-class counts were computed per piece, weighting by note duration. Epochs were determined by minima in a Gaussian kernel density estimate of composition dates, producing boundaries at 1649, 1758, 1817, and 1856, approximating Renaissance, Baroque, Classical, early Romantic, and late Romantic periods. A subset (n=6655, ~49.7%) had user-supplied key labels (estimated 87.5% accuracy), used only for evaluation. Assumptions: octave equivalence; transpositional invariance; 12 pitch classes. Models: 1) Geometric model: Treats each piece’s pitch-class distribution as a point in a mode space; uses dimensionality reduction (t-SNE) for visualization and Gaussian Mixture Models (GMMs) in the reduced space to cluster pieces by mode. The optimal number of modes per period is selected via silhouette scores computed for K=2,…,6. 2) Bayesian probabilistic model: A generative model representing modes as Dirichlet distributions over pitch-class probabilities. For each period, the model infers the posterior over root (R) and mode (M) for each piece P: p(R, M | P, T, D), using Gibbs sampling (following Johnson et al., 2007; Sato, 2011). Keys are pairs (R, M); the maximum-a-posteriori (MAP) (R*, M*) provides a mode classification per piece. This classifier operates in the original mode space (no dimensionality reduction) and assumes transpositional invariance by transposing pieces to a common root when estimating mode characteristics. Evaluation and measures: - Visual clustering via t-SNE across periods. - Silhouette scores to select the number of clusters (modes). - Mode clarity: proportion of correctly predicted modes relative to metadata labels with 95% bootstrap CIs, to quantify separability of major/minor across periods. - Mode templates: For the Common-Practice period (Baroque+Classical), Bayesian-inferred pitch-class distributions for major/minor were summarized via maxima and compared to prior templates (Krumhansl & Kessler; Temperley; metadata-based; Albrecht & Shanahan), using circular (circle-of-fifths) radar plots to reflect adjacency of in-scale/out-of-scale tones.
Key Findings
- Clustering patterns (t-SNE): Clear multi-cluster structure in earlier periods: four clusters in the Renaissance; two clusters in Baroque and Classical; increasingly mixed distributions in early and late Romantic, suggesting weaker global modal separation. - Optimal number of modes per period (silhouette scores): • Renaissance: K=4 best (0.494); K=3 (0.487); K=2 (0.421). • Baroque: K=2 best (0.541); K=3 (0.514); K=4 (0.440). • Classical: K=2 best (0.460); K=3 (0.402); K=4 (0.392). • Early Romantic: K=2 best (0.419); K=3 (0.409); K=4 (0.383). • Late Romantic: K=3 best (0.417); K=4 (0.406); K=2 (0.382). - Consistency across methods: The Bayesian classifier’s two-mode solution closely matches GMM/t-SNE partitions in Baroque and Classical; less distinct in Renaissance and Romantic periods. - Mode clarity (proportion correct vs. metadata): Highest in the Classical period, next highest in Baroque; substantially lower in the Romantic eras; not reliably quantifiable for Renaissance due to sparse labels, indicating a strong major/minor separation in Common-Practice and weaker global modal distinctions in the 19th century. - Renaissance mode characteristics (Bayesian, K=4): Four clusters resemble historical modal categories: Mixolydian, Ionian, Dorian, and a mode between Aeolian and Dorian (with ambiguity in pitch classes distinguishing them), aligning with music-theoretical descriptions of Renaissance modality. - Common-Practice major/minor templates: Bayesian-inferred pitch-class distributions show clear in-scale vs. out-of-scale separation (approximate threshold around 5% relative frequency). Corpus-derived templates (this study; metadata; Albrecht & Shanahan) allocate less mass to out-of-scale tones than experimental templates (Krumhansl & Kessler; Temperley). Symmetries between major and minor are evident, with the most distinctive classes being the thirds (pitch classes 3 vs. 4) and sixths (8 vs. 9), corroborating theory. - Overall: Two modes (major/minor) are most appropriate in Baroque and Classical; four modes in Renaissance; Romantic era shows diminished separability into global modes, reflecting increased chromaticism and modulation.
Discussion
The findings support the central research aim: the number and characteristics of modes can be inferred directly from pitch-class statistics without assuming major/minor a priori. Both geometric and Bayesian approaches converge on two clearly separable modes in Baroque and Classical periods, aligning with Common-Practice tonality. In the Renaissance, four distinct modal clusters emerge, consistent with historical modal theory and suggesting that an assumption of only two modes is inappropriate for this era. For the 19th century, reduced mode clarity and optimal clustering with more than two modes (late Romantic) indicate that a single global mode is less suitable due to increased chromaticism and frequent modulations. These results empirically substantiate cognitive accounts of statistical learning of tonal hierarchies and show that simple “bag-of-notes” pitch statistics capture historically relevant tonal structures. The close match of Bayesian-inferred mode templates with corpus-derived templates validates the unsupervised model’s ability to recover major/minor characteristics, while differences from experimental templates highlight context and task differences between perception experiments and compositional pitch distributions.
Conclusion
Using a large MIDI corpus and two complementary models, the study demonstrates that basic pitch-class statistics can uncover the number and shapes of modes across historical periods in Western classical music. Four modes are most plausible in the Renaissance; two modes (major/minor) dominate the Baroque and Classical eras; and mode separability weakens in the Romantic periods. The Bayesian model recovers major/minor templates without prior assumptions, supporting theories of statistical learning in tonal cognition. Future research directions include hierarchical and mixture models for local/modulatory structure, more structured tonal representations beyond 12 pitch classes, expanded historical and stylistic coverage (from medieval to contemporary popular music), and cross-cultural comparisons to probe biological versus cultural contributions to musical evolution.
Limitations
- Data source biases: Crowd-sourced ClassicalArchives MIDI files may contain encoding/transcription errors, skew toward popular composers/pieces, and overrepresent piano music; pre-Baroque vocal repertoire is underrepresented. - Metadata reliability: Key labels used for evaluation have estimated 87.5% accuracy and are incomplete, especially for Renaissance. - Modeling assumptions: Transpositional invariance and 12 pitch-class discretization may be historically inappropriate for some repertoires (notably Renaissance), potentially affecting inferred modes. - Dimensionality reduction caveat: t-SNE visualizations aid interpretation of cluster structure, but clustering in reduced space can distort original distances; hence the Bayesian classifier operates in the original space. - Global-mode perspective: Analyses rely on global pitch-class distributions, which may obscure local keys and modulations, especially in the 19th century. - Limited feature scope: “Bag-of-notes” pitch statistics ignore rhythm, harmony/voice-leading details, and temporal structure that also contribute to tonality.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny