logo
Loading...
Reframing the filter bubble through diverse scale effects in online music consumption

Interdisciplinary Studies

Reframing the filter bubble through diverse scale effects in online music consumption

D. Shakespeare, V. Chareyron, et al.

An empirical analysis of ~50,000 Deezer users' artist consumption histories reveals that scale and information representation matter: algorithmic curation can introduce more novelty than users achieve organically, yet that novelty tends to be more semantically confined. Research conducted by Dougal Shakespeare, Victor Chareyron, and Camille Roth.... show more
Introduction

The study examines whether algorithmic curation devices on music platforms constrain or increase consumption diversity and how conclusions depend on measurement scale and information representation. It frames diversity appraisals along two comparative strands: algorithmic consumption versus algorithmic exposure (selective exposure theory), and algorithmic consumption versus lesser algorithmic or organic affordances. Prior research offers mixed evidence, with some studies indicating filter bubble effects and others suggesting increased diversity. The authors posit that tensions arise from using discrete (item/category-based) versus dense (embedding-based) representations of information and from aggregation scale differences (short-term sessions versus longer-term aggregates). The work focuses on Deezer’s multiple affordances and evaluates diversity across intra-session, inter-session, and inter-affordance scales to refine the filter bubble narrative.

Literature Review

Two main strands are reviewed: (1) algorithmic exposure versus organic selection, often in political communication, where selective exposure posits users consume content aligned with prior beliefs; empirical results show users’ organic selections can further constrain diversity within algorithmic exposure contexts. (2) Cross-affordance comparisons within and across platforms, contrasting algorithmic versus organic or lesser algorithmic affordances; many works find algorithmic consumptions are no less or even more concentrated than organic ones. Nuances include content-specific and user-category effects, and importantly, differences when using geometric (dense) versus discrete diversity measures. Recent results suggest weak connections between novelty-based and spatial disparity measures and highlight scale variance across studies (weekly, monthly, multi-year). Session-based dynamics are underexplored despite the prevalence of bursty user behavior. This work situates itself within multi-affordance music streaming studies and aims to reconcile contradictory findings by jointly considering representation types and aggregation scales.

Methodology

Platform and affordances: Deezer offers four on-platform affordance classes used to originate streams: organic playlists (P), organic query-based access (Q; search/browse with minimal algorithmic assistance), editorial curation (E; Deezer staff playlists), and algorithmic curation (A; recommender systems such as Flow). Dataset: One month of listening histories (March 2023) from a random sample of 48,243 anonymized subscribers who streamed at least once. Events shorter than 30 seconds were removed as skips, yielding 23,944,195 streams across 168,685 unique artists. Sessions: Defined as continuous engagement segments where a new session starts if the inter-event gap exceeds 20 minutes (capturing 91% of inter-stream times), resulting in 2,209,591 sessions. Analysis focuses on monadic sessions (only one affordance used); on average, 73.9% of a user’s sessions are monadic; platform-level prevalence is 83.3%. Filtering strategies: Two subsets are built—(i) affordance-centric sets containing users with ≥3 monadic sessions in each specific affordance (n_A=12,411; n_E=10,923; n_P=28,896; n_Q=36,004), and (ii) a user-centric multi-affordance set of users with ≥3 monadic sessions in each of the four affordances (n=1,672) to mitigate ecological fallacy. Item representation: Artists are the items. Dense representation via a 128-dim SVD embedding (Truncated SVD on a popularity-normalized artist-artist co-occurrence matrix within sessions, with PPMI transformation). The embedding is chosen for interpretability, reproducibility, and links to SGNS. Diversity measures: Spatial disparity measured with the activity-weighted Generalist–Specialist (GS) score. Intra-session GS(S) assesses dispersion of artist vectors around the session centroid; inter-session GS(S_F) assesses dispersion of session centroids around affordance-level centroids across time. Sessions containing ≥20% niche artists (<10 listeners) are filtered/dropped to ensure reliable embeddings. Novelty measures: Intra-session fandom rate φ_F quantifies sessions with only one unique artist; intra-session redundancy R(S)=1−(supp(S)/|S|) adjusted by session length via percentile ranks, then averaged per affordance. Inter-session and inter-affordance overlap modeled via Bipartite Configuration Model (BiCM) to form session–artist bipartite graphs and their monopartite session projections; session similarity significance is determined by V-motif expectations under BiCM (α=0.05), and overlap γ is the fraction of sessions with at least one significant connection. Inter-affordance similarity is computed both in embedding space (cosine similarity of affordance-level centroids per user, z-normalized) and via BiCM-based γ overlaps across affordance pairs.

Key Findings
  • Session composition and scope: 2,209,591 sessions; monadic sessions dominate (mean 73.9% per user; platform-level 83.3%). Dataset includes 23,944,195 streams and 168,685 artists, from 48,243 users.
  • Intra-session spatial disparity (GS(S), lower is more diverse; affordance-centric): P 0.809 (±0.083, significantly most diverse), A 0.829 (±0.073), E 0.874 (±0.081), Q 0.919 (±0.065, least diverse).
  • Inter-session spatial disparity over time (GS(S_F), lower is more diverse): Q 0.793 (±0.106, significantly most diverse), E 0.852 (±0.105), P 0.860 (±0.079), A 0.870 (±0.078). Algorithmic sessions, though diverse within sessions, become relatively concentrated across sessions.
  • Inter-session novelty (BiCM overlap γ; lower implies greater novelty/less repetition across sessions): A 0.098 (±0.159, significantly most novel), E 0.130 (±0.222), Q 0.134 (±0.210), P 0.244 (±0.255, most overlap/repetition).
  • Intra-session fandom rates (fraction of sessions with only one unique artist): Q 46.9% (least novel), E 26.13%, P 18.40%, A 13.54% (most novel). User-centric dataset shows similar values (±5%).
  • Intra-session redundancy (length-adjusted R; higher is more exploitative): affordance-centric ⟨R_Q⟩=0.475 (most exploitation), ⟨R_P⟩=0.335, ⟨R_E⟩=0.286, ⟨R_A⟩=0.189 (most exploration). User-centric multi-affordance: ⟨R_Q⟩=0.439, ⟨R_P⟩=0.292, ⟨R_E⟩=0.288, ⟨R_A⟩=0.202.
  • Correlation between spatial disparity and novelty is weak (average r≈0.261), indicating distinct aspects of diversity.
  • Inter-affordance similarities: In embedding space, A, P, and Q are closest; E sessions reach more distant regions, indicating semantically distinct artist areas. In set-theoretic novelty terms, the largest artist overlap occurs between organic affordances (P and Q), while E remains the most novel relative to others.
  • Overall: Algorithms introduce more item-level novelty than organic behavior, but this novelty is more semantically confined (less spatial disparity across sessions), reframing the filter bubble narrative.
Discussion

The findings address whether algorithmic devices confine or diversify user consumption by showing that conclusions depend on both aggregation scale and representation type. At the intra-session scale, algorithmic sessions increase diversity and novelty compared to organic query sessions, which are most concentrated and exploitative. Over time, however, algorithmic sessions become spatially confined across sessions, whereas organic query sessions are most spatially diverse inter-session. Novelty-based measures reveal algorithmic sessions produce the least overlap across sessions, indicating higher novelty, while personal playlists show the most repetition. Inter-affordance analyses further demonstrate editorial curation drives exploration into more semantically distant regions, and organic affordances share greater item overlap. Together, these results suggest algorithmic systems can break users out of their bubbles by introducing novel items, yet concurrently reinforce semantic boundaries by keeping exploration confined to nearby regions of the artist space and mirroring organic consumption patterns. The study underscores the need to evaluate algorithmic impacts using multiple scales and both discrete and dense representations.

Conclusion

This work contributes a multi-scale, multi-representation appraisal of diversity on a large music streaming platform, demonstrating that algorithmic curation tends to increase item novelty but with semantically confined exploration over time. By disaggregating affordances and working at intra-session, inter-session, and inter-affordance scales, the study reconciles contradictory findings in the filter bubble debate. Future research should test the relationships between novelty and spatial disparity across other platform types (e.g., social media), and bridge the gap between algorithmic exposure and multi-affordance consumption literatures to account for the human selection component. The authors also release a user-centric multi-affordance dataset and code to foster reproducibility and further investigation.

Limitations

The analysis is limited to Deezer and one month of activity, which may affect generalizability to other platforms and longer time horizons. Affordance interfaces and recommendation logics vary across platforms, potentially altering dynamics. The study focuses on artist-level representations (not songs) and filters niche artists to ensure embedding reliability. While multi-affordance users are included to mitigate ecological fallacy, selection criteria (≥3 sessions per affordance) may introduce biases. The work does not directly integrate algorithmic exposure logs with consumption logs, leaving the organic selection component of recommendations for future study.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny