Ruozi Huang, Huang Hu, Wei Wu, Kei Sawada, Mi Zhang, Daxin Jiang
Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video to demonstrate the superior performance of our proposed approach at https://www.youtube.com/watch?v=lmE20MEheZ8.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Tracking | BRACE | Beat DTW cost | 11.88 | Dance Revolution |
| Pose Tracking | BRACE | Beat alignment score | 0.264 | Dance Revolution |
| Pose Tracking | BRACE | Footwork average | 51.6 | Dance Revolution |
| Pose Tracking | BRACE | Frechet Inception Distance | 0.5158 | Dance Revolution |
| Pose Tracking | BRACE | Powermove average | 37.72 | Dance Revolution |
| Pose Tracking | BRACE | Toprock average | 10.59 | Dance Revolution |
| Pose Tracking | AIST++ | Beat alignment score | 0.195 | Dance Revolution |
| Pose Tracking | AIST++ | FID | 73.42 | Dance Revolution |
| Motion Synthesis | BRACE | Beat DTW cost | 11.88 | Dance Revolution |
| Motion Synthesis | BRACE | Beat alignment score | 0.264 | Dance Revolution |
| Motion Synthesis | BRACE | Footwork average | 51.6 | Dance Revolution |
| Motion Synthesis | BRACE | Frechet Inception Distance | 0.5158 | Dance Revolution |
| Motion Synthesis | BRACE | Powermove average | 37.72 | Dance Revolution |
| Motion Synthesis | BRACE | Toprock average | 10.59 | Dance Revolution |
| Motion Synthesis | AIST++ | Beat alignment score | 0.195 | Dance Revolution |
| Motion Synthesis | AIST++ | FID | 73.42 | Dance Revolution |
| 10-shot image generation | BRACE | Beat DTW cost | 11.88 | Dance Revolution |
| 10-shot image generation | BRACE | Beat alignment score | 0.264 | Dance Revolution |
| 10-shot image generation | BRACE | Footwork average | 51.6 | Dance Revolution |
| 10-shot image generation | BRACE | Frechet Inception Distance | 0.5158 | Dance Revolution |
| 10-shot image generation | BRACE | Powermove average | 37.72 | Dance Revolution |
| 10-shot image generation | BRACE | Toprock average | 10.59 | Dance Revolution |
| 10-shot image generation | AIST++ | Beat alignment score | 0.195 | Dance Revolution |
| 10-shot image generation | AIST++ | FID | 73.42 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Beat DTW cost | 11.88 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Beat alignment score | 0.264 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Footwork average | 51.6 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Frechet Inception Distance | 0.5158 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Powermove average | 37.72 | Dance Revolution |
| 3D Human Pose Tracking | BRACE | Toprock average | 10.59 | Dance Revolution |
| 3D Human Pose Tracking | AIST++ | Beat alignment score | 0.195 | Dance Revolution |
| 3D Human Pose Tracking | AIST++ | FID | 73.42 | Dance Revolution |