Experimenting with musically motivated convolutional neural networks

Jordi Pons, Thomas Lidy, Xavier Serra
2016 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)  
A common criticism of deep learning relates to the difficulty in understanding the underlying relationships that the neural networks are learning, thus behaving like a blackbox. In this article we explore various architectural choices of relevance for music signals classification tasks in order to start understanding what the chosen networks are learning. We first discuss how convolutional filters with different shapes can fit specific musical concepts and based on that we propose several
more » ... lly motivated architectures. These architectures are then assessed by measuring the accuracy of the deep learning model in the prediction of various music classes using a known dataset of audio recordings of ballroom music. The classes in this dataset have a strong correlation with tempo, what allows assessing if the proposed architectures are learning frequency and/or time dependencies. Additionally, a black-box model is proposed as a baseline for comparison. With these experiments we have been able to understand what some deep learning based algorithms can learn from a particular set of data.
doi:10.1109/cbmi.2016.7500246 dblp:conf/cbmi/PonsLS16 fatcat:yfnqfa6lpnefnp2fr7ad7ektkm