Wenlin Zhuang, Congyi Wang, Siyu Xia, Jinxiang Chai, Yangang Wang
Synthesize human motions from music, i.e., music to dance, is appealing and attracts lots of research interests in recent years. It is challenging due to not only the requirement of realistic and complex human motions for dance, but more importantly, the synthesized motions should be consistent with the style, rhythm and melody of the music. In this paper, we propose a novel autoregressive generative model, DanceNet, to take the style, rhythm and melody of music as the control signals to generate 3D dance motions with high realism and diversity. To boost the performance of our proposed model, we capture several synchronized music-dance pairs by professional dancers, and build a high-quality music-dance pair dataset. Experiments have demonstrated that the proposed method can achieve the state-of-the-art results.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Tracking | AIST++ | Beat alignment score | 0.143 | DanceNet |
| Pose Tracking | AIST++ | FID | 69.13 | DanceNet |
| Motion Synthesis | AIST++ | Beat alignment score | 0.143 | DanceNet |
| Motion Synthesis | AIST++ | FID | 69.13 | DanceNet |
| 10-shot image generation | AIST++ | Beat alignment score | 0.143 | DanceNet |
| 10-shot image generation | AIST++ | FID | 69.13 | DanceNet |
| 3D Human Pose Tracking | AIST++ | Beat alignment score | 0.143 | DanceNet |
| 3D Human Pose Tracking | AIST++ | FID | 69.13 | DanceNet |