logo
ResearchBunny Logo
Msanii: High Fidelity Music Synthesis on a Shoestring Budget

The Arts

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

K. Maina

Discover Msanii, a groundbreaking model for creating stunningly high-fidelity music with long duration and efficiency. Authored by Kinyugo Maina, this innovative approach utilizes mel spectrograms and diffusion models to generate seamless audio experiences without traditional synthesis methods. Dive into the future of music synthesis!

00:00
00:00
~3 min • Beginner • English
Abstract
In this paper, we present Msanii, a novel diffusion-based model for synthesizing long-context, high-fidelity music efficiently. Our model combines the expressiveness of mel spectrograms, the generative capabilities of diffusion models, and the vocoding capabilities of neural vocoders. We demonstrate the effectiveness of Msanii by synthesizing tens of seconds (190 seconds) of stereo music at high sample rates (44.1 kHz) without the use of concatenative synthesis, cascading architectures, or compression techniques. To the best of our knowledge, this is the first work to successfully employ a diffusion-based model for synthesizing such long music samples at high sample rates. Our demo can be found here and our code here.
Publisher
None (Work in Progress)
Published On
Jan 18, 2023
Authors
Kinyugo Maina
Tags
music synthesis
diffusion model
mel spectrograms
audio inpainting
style transfer
neural vocoders
high fidelity
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny