A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music. The proposed AIST++ dataset contains 5.2 hours of 3D dance motion in 1408 sequences, covering 10 dance genres with multi-view videos with known camera poses – the largest dataset of this kind to our knowledge. We show that naively applying sequence models such as transformers to this dataset for the taskarXiv:2101.08779v3 fatcat:gsdbkgtq7vb7xhxsfrmfb43tke