Music boundary detection using neural networks on spectrograms and self-similarity lag matrices

Thomas Grill, Jan Schluter
2015 2015 23rd European Signal Processing Conference (EUSIPCO)  
The first step of understanding the structure of a music piece is to segment it into formative parts. A recently successful method for finding segment boundaries employs a Convolutional Neural Network (CNN) trained on spectrogram excerpts. While setting a new state of the art, it often misses boundaries defined by non-local musical cues, such as segment repetitions. To account for this, we propose a refined variant of self-similarity lag matrices representing long-term relationships. We then
more » ... onships. We then demonstrate different ways of fusing this feature with spectrogram excerpts within a CNN, resulting in a boundary recognition performance superior to the previous state of the art. We assume that the integration of more features in a similar fashion would improve the performance even further.
doi:10.1109/eusipco.2015.7362593 dblp:conf/eusipco/GrillS15 fatcat:zqt7zxui2vexjgt4j7t2aldc4e