A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA
[article]
2019
arXiv
pre-print
Deep learning has successfully shown excellent performance in learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities, such as audio and video, should be taken into account. Music video retrieval by given musical audio is a natural way to search and interact with music contents. In this work, we study cross-modal music video retrieval in terms of emotion
arXiv:1908.03744v1
fatcat:2l2vdm7a7zatvdbk6ja2tfdmqi