TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Simple and Controllable Music Generation

Simple and Controllable Music Generation

Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

2023-06-08NeurIPS 2023 11Music GenerationText-to-Music GenerationLanguage Modelling
PaperPDFCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft

Results

TaskDatasetMetricValueModel
Text-to-Music GenerationMusicCapsFAD3.4MusicGen w/o melody (1.5B)
Text-to-Music GenerationMusicCapsKL_passt1.23MusicGen w/o melody (1.5B)
Text-to-Music GenerationMusicCapsFAD3.8MusicGen w/o melody (3.3B)
Text-to-Music GenerationMusicCapsFD_openl3197.12MusicGen w/o melody (3.3B)
Text-to-Music GenerationMusicCapsKL_passt1.31MusicGen w/o melody (3.3B)
Text-to-Music GenerationMusicCapsFAD5MusicGen w/ random melody (1.5B)
Text-to-Music GenerationMusicCapsKL_passt1.31MusicGen w/ random melody (1.5B)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16