logo
ResearchBunny Logo
Abstract
This paper introduces Msanii, a novel diffusion-based model for efficient, long-context, high-fidelity music synthesis. It combines mel spectrograms, diffusion models, and neural vocoders to generate tens of seconds of stereo music at 44.1 kHz without concatenative synthesis, cascading architectures, or compression. The model also demonstrates potential for audio inpainting and style transfer.
Publisher
None (Work in Progress)
Published On
Jan 18, 2023
Authors
Kinyugo Maina
Tags
music synthesis
diffusion model
mel spectrograms
audio inpainting
style transfer
neural vocoders
high fidelity
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny