A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme.doi:10.5281/zenodo.5624405 fatcat:jxi3c6edbrelplsmbayqjo6e24