logo
ResearchBunny Logo
Msanii: High Fidelity Music Synthesis on a Shoestring Budget

The Arts

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

K. Maina

Discover Msanii, a groundbreaking model for creating stunningly high-fidelity music with long duration and efficiency. Authored by Kinyugo Maina, this innovative approach utilizes mel spectrograms and diffusion models to generate seamless audio experiences without traditional synthesis methods. Dive into the future of music synthesis!

00:00
00:00
Playback language: English
Abstract
This paper introduces Msanii, a novel diffusion-based model for efficient, long-context, high-fidelity music synthesis. It combines mel spectrograms, diffusion models, and neural vocoders to generate tens of seconds of stereo music at 44.1 kHz without concatenative synthesis, cascading architectures, or compression. The model also demonstrates potential for audio inpainting and style transfer.
Publisher
None (Work in Progress)
Published On
Jan 18, 2023
Authors
Kinyugo Maina
Tags
music synthesis
diffusion model
mel spectrograms
audio inpainting
style transfer
neural vocoders
high fidelity
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny