TMD
Text-Music-Dance
Introduced 2025-03-10
The Text-Music-Dance (TMD) dataset establishes a pioneering benchmark comprising 2,153 text-music-motion pairs. Dance motions and corresponding text annotations are sourced from Motion-X, incorporating AIST++ and other datasets. For motion-text pairs lacking music, corresponding music is generated using Stable Audio Open with beat adjustment and validated through expert assessments, ensuring inter-rater reliability.
Benchmarks
10-shot image generation/FID10-shot image generation/BAS10-shot image generation/MModality10-shot image generation/MMDist3D Human Pose Tracking/FID3D Human Pose Tracking/BAS3D Human Pose Tracking/MModality3D Human Pose Tracking/MMDistMotion Synthesis/FIDMotion Synthesis/BASMotion Synthesis/MModalityMotion Synthesis/MMDistPose Tracking/FIDPose Tracking/BASPose Tracking/MModalityPose Tracking/MMDist