logo
ResearchBunny Logo
Emotion-aware music tower blocks (EMOMTB): an intelligent audiovisual interface for music discovery and recommendation

Computer Science

Emotion-aware music tower blocks (EMOMTB): an intelligent audiovisual interface for music discovery and recommendation

A. B. Melchiorre, M. Schedl, et al.

Discover EmoMTB, an innovative music exploration system designed by researchers Alessandro B Melchiorre, Markus Schedl, David Penz, Christian Ganhör, Oleg Lesota, Vasco Fragoso, Florian Fritzl, Emilia Parada-Cabaleiro, and Franz Schubert. Navigate a vibrant 'music city' where tracks are visually represented as colored cubes, allowing you to delve into familiar and new genres while experiencing emotional feedback.

00:00
00:00
Playback language: English
Introduction
The widespread use of music streaming services has revolutionized music consumption. However, these platforms primarily rely on text-based search and linear, list-based recommendations. This approach presents several challenges. Cognitive biases, such as position bias (favoring items at the beginning of a list) and recency bias, influence user choices, leading to a skewed perception of the available music. Algorithmic biases, particularly popularity bias, further exacerbate this issue by over-representing popular tracks and potentially hiding lesser-known but equally enjoyable music. These limitations hinder users' ability to discover new music and can lead to dissatisfaction. To address these limitations, this research proposes EmoMTB, a novel intelligent audiovisual interface designed for music exploration and recommendation. EmoMTB departs from the linear presentation of music typical of current platforms, instead providing a non-linear, explorative experience. It leverages the high bandwidth of visual information processing to create a rich and engaging user experience. The key innovation of EmoMTB lies in its integration of personalized, emotion-aware recommendations, which serve as starting points for exploration within a large, visually organized music catalog. The system's use of a 'music city' metaphor enables intuitive navigation and encourages users to venture beyond their usual listening preferences. The importance of EmoMTB stems from its potential to significantly improve music discovery and enhance user engagement. By mitigating the negative impact of cognitive and algorithmic biases, EmoMTB aims to provide a more personalized, serendipitous, and enjoyable experience for music listeners. The system's design and evaluation contribute to the broader field of human-computer interaction, specifically within the domain of intelligent user interfaces for music recommendation and exploration.
Literature Review
Existing music exploration interfaces often utilize spatial arrangements of music pieces to facilitate non-linear exploration. Early systems like Islands of Music, nepTune, and deepTune employed clustering based on audio features to create geographic landscapes. Music Galaxy adopted a universe metaphor, while Songrium focused on video streaming platforms. MusicLatentVIS used deep learning for latent representation and t-SNE for projection. Schedl et al. proposed interfaces leveraging audio features, genre data, and t-SNE. Some interfaces incorporate emotion, such as Vad et al.'s t-SNE-based visualization from emotion descriptors, and Liang and Willemsen's genre discovery interface based on energy and valence. Research in music emotion recognition (MER) typically utilizes acoustic cues, lyrics, or multimodal approaches. However, copyright restrictions limit access to audio data. User-generated tags from platforms like Last.fm offer a readily available alternative, though less frequently used for MER. Studies in emotion-aware music recommendation integrate emotion information from various sources, including social media posts, physiological signals, and user-generated tags. EmoMTB differentiates itself by combining a large-scale dataset (LFM-2b), audio and genre features for clustering, personalized emotion-aware recommendations, smartphone-based navigation, and integration with Spotify for seamless playback.
Methodology
EmoMTB's development involved several key steps. First, a large-scale dataset (LFM-2b) containing listening events and user-generated tags was augmented with audio features retrieved from the Spotify API. Matching tracks from LFM-2b to Spotify's catalog was carefully done using string similarity to ensure accuracy, resulting in a collection of 436,064 tracks. The city-like landscape was created using t-SNE to project tracks onto a 2D plane based on both fine-grained genres (extracted from Last.fm tags and matched against the EveryNoise list) and audio features from Spotify (energy, valence, acousticness, instrumentalness, and speechiness). PCA was applied beforehand for dimensionality reduction. Tracks were then visualized as colored cubes, stacked based on popularity, forming buildings and neighborhoods representing different genres. The color scheme for genres was based on a user study by Holm et al. Emotion prediction utilized a multilayer perceptron classifier trained on various social media datasets labeled with four basic emotions (happiness, sadness, anger, fear). Last.fm tags and EmoMTB's Twitter feed were used for song and 'crowd' emotion prediction respectively. OpenXBoW, ANEW, and VADER lexica were employed for text representation. Personalized recommendations were retrieved from the Spotify API using the user's top tracks. The recommendations were then filtered to include only tracks present in EmoMTB's catalog and re-ordered based on the predicted emotion, placing tracks matching the user's selected emotion at the top. The interface was implemented in JavaScript using three.js, with a lightweight design to minimize distractions. The system architecture consists of a web server, user's smartphone, and a display computer, allowing for flexible deployment.
Key Findings
The evaluation of EmoMTB encompassed three main aspects: clustering quality, emotion recognition performance, and user experience. The clustering quality was assessed quantitatively using entropy calculations, comparing the genre homogeneity of EmoMTB's landscape to that of a randomly shuffled landscape. The result showed a significantly higher genre homogeneity (6.7% of maximum entropy compared to 50% for the shuffled version), indicating the effectiveness of the clustering algorithm in grouping similar tracks together. Emotion recognition performance was evaluated using a fivefold cross-validation setup with Monte Carlo sampling on an aggregated dataset of 21,480 samples from various sources. The results varied across datasets, with larger datasets showing higher accuracy, recall, and precision. Overall accuracy for the aggregated dataset was 59%. While not perfect, this demonstrates the feasibility of using user-generated tags and transfer learning to predict music emotions. Qualitative user feedback was collected through a post-experience questionnaire at the Ars Electronica Festival. Eight participants provided valuable insights. The majority found EmoMTB's music discovery aspect the most valuable, highlighting the entertainment and visual appeal of the interface. They generally perceived the interface as intuitive and easy to use. While mostly satisfied with the recommendations, some suggestions for improvement centered on refining the emotional component, including the accuracy of emotion assignment and potentially enriching the visualization. The city metaphor itself was well-received.
Discussion
The findings demonstrate EmoMTB's success in addressing the limitations of traditional music recommendation systems. The high genre homogeneity of the spatial layout effectively facilitates non-linear music exploration, enabling users to easily discover similar tracks and explore genre transitions. While the emotion recognition accuracy isn't perfect, it's sufficient to provide relevant recommendations, which serve as valuable starting points for exploration. The positive user feedback validates EmoMTB's intuitive interface and its ability to enhance music discovery. The study underscores the importance of considering both cognitive and algorithmic biases in system design, and highlights the effectiveness of combining algorithmic precision with a serendipitous, exploratory interface.
Conclusion
EmoMTB offers a novel approach to music discovery by integrating precise algorithmic recommendations with the excitement of free exploration within a visually rich environment. Its ability to present nearly half a million tracks in a navigable, spatially coherent manner, coupled with emotion-aware recommendations, addresses the inherent biases of traditional list-based approaches. The positive user feedback and quantitative evaluations highlight the system's potential to revolutionize music listening experiences. Future work includes enhancing interaction capabilities, refining the emotional component, expanding to multi-user experiences, and investigating the role of such interfaces in mitigating popularity biases.
Limitations
While EmoMTB received positive feedback, several limitations exist. Currently, a Spotify account is required due to technical and legal constraints. The emotion recognition performance could be improved, and the emotional integration into the interface can be made more sophisticated. The dual-screen requirement (large screen for visualization, smartphone for control) might not be optimal for all users. Addressing these limitations through technological advancements and improved algorithms is crucial for further development.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny