A dataset for learning stylistic and cultural correlations between music and videos

Xinyi Chen, Hui Zhang, Songruoyao Wu, Jun Zheng, Lingyun Sun, Kejun Zhang
2022 Cognitive Computation and Systems  
Music-visual retrieval is of broad interest in the field of Music Information Retrieval (MIR). Most research relies on emotional tags or is based on content but does not consider stylistic and cultural differences between music and videos. As a result, only onesided dimensions are considered for automatic music retrieval for videos, while the stylistic correlation between audio-visual is ignored. At the same time, the needs of different cultural regions cannot be well met. Therefore, the first
more » ... abelled extensive Music Video (MV) dataset, Next-MV, is constructed in this paper consisting of 6000 pieces of 30-s MV fragments, including five music style labels and four cultural labels. The proposed Next-Net framework is built to study the correlation between music style and visual style. The optimal audiovisual feature set and model structure are obtained in the experiments. The accuracy reached 71.1%, higher than the baseline model (66.9%). Furthermore, in the cross-cultural experiment, it is found that the accuracy of the general fusion model (71.1%) is between the model trained by within-dataset (76%) and the model trained by cross-dataset (60%), indicating that culture has a significant influence on the correlation between music and visual. The experiments of pair classification on cultures are further carried out. It is found that Rock and Dance are more culturally influenced than R&B and Hip-hop. Among all the cultures discussed, Chinese and Japanese music and videos show great differences among most of the styles, while Korean music videos styles are more similar to western styles than other eastern cultures. K E Y W O R D S cross culture, cross-modal processing, deep learning architectures, multimodal fusion, new datasets This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
doi:10.1049/ccs2.12043 fatcat:bo4okiuehnbp5ekcpbycnyni3m