Visual Attention for Musical Instrument Recognition [article]

Karn Watcharasupat, Siddharth Gururani, Alexander Lerch
2020 arXiv   pre-print
In the field of music information retrieval, the task of simultaneously identifying the presence or absence of multiple musical instruments in a polyphonic recording remains a hard problem. Previous works have seen some success in improving instrument classification by applying temporal attention in a multi-instance multi-label setting, while another series of work has also suggested the role of pitch and timbre in improving instrument recognition performance. In this project, we further
more » ... the use of attention mechanism in a timbral-temporal sense, \'a la visual attention, to improve the performance of musical instrument recognition using weakly-labeled data. Two approaches to this task have been explored. The first approach applies attention mechanism to the sliding-window paradigm, where a prediction based on each timbral-temporal 'instance' is given an attention weight, before aggregation to produce the final prediction. The second approach is based on a recurrent model of visual attention where the network only attends to parts of the spectrogram and decide where to attend to next, given a limited number of 'glimpses'.
arXiv:2006.09640v2 fatcat:cwtyxoszdjfungn3hx7cemnao4