DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability

Foram Niravbhai Shah, Parshwa Shah, Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Ahmed Helmy

2025-04-06Motion Generation Motion Synthesis

Abstract

Recent advances in dance generation have enabled automatic synthesis of 3D dance motions. However, existing methods still struggle to produce high-fidelity dance sequences that simultaneously deliver exceptional realism, precise dance-music synchronization, high motion diversity, and physical plausibility. Moreover, existing methods lack the flexibility to edit dance sequences according to diverse guidance signals, such as musical prompts, pose constraints, action labels, and genre descriptions, significantly restricting their creative utility and adaptability. Unlike the existing approaches, DanceMosaic enables fast and high-fidelity dance generation, while allowing multimodal motion editing. Specifically, we propose a multimodal masked motion model that fuses the text-to-motion model with music and pose adapters to learn probabilistic mapping from diverse guidance signals to high-quality dance motion sequences via progressive generative masking training. To further enhance the motion generation quality, we propose multimodal classifier-free guidance and inference-time optimization mechanism that further enforce the alignment between the generated motions and the multimodal guidance. Extensive experiments demonstrate that our method establishes a new state-of-the-art performance in dance generation, significantly advancing the quality and editability achieved by existing approaches.

Results

Task	Dataset	Metric	Value	Model
Pose Tracking	FineDance	BAS	0.2254	DanceMosaic
Pose Tracking	FineDance	fid_k	19.36	DanceMosaic
Motion Synthesis	FineDance	BAS	0.2254	DanceMosaic
Motion Synthesis	FineDance	fid_k	19.36	DanceMosaic
10-shot image generation	FineDance	BAS	0.2254	DanceMosaic
10-shot image generation	FineDance	fid_k	19.36	DanceMosaic
3D Human Pose Tracking	FineDance	BAS	0.2254	DanceMosaic
3D Human Pose Tracking	FineDance	fid_k	19.36	DanceMosaic

DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability

Abstract

Results

Related Papers

DanceMosaic: High-Fidelity Dance Generation with Multimodal Editability

Abstract

Results

Related Papers