Genre Classification of Music Using CNN and Nonstationary Gabor Transform
CNNと Nonstationary Gabor Transformを用いた音楽のジャンル分類

Toshiaki SEINO, Keiu HARADA
JSAI Technical Report, SIG-KBS  
In this study, we attempted to solve genre classification task of music using GTZAN mainset dataset [1] with VGG-like CNN. We use an amplitude spectrum obtained by converting a discrete waveform by Nonstationary Gabor Transform [2] as the input of CNN. In order to find the optimal parameters, we change the length of discrete waveform data for 30 seconds to 2, 3, 4, 5, 10 seconds. We found that discrete waveform data length of 2 seconds achieved the highest recognition accuracy.
doi:10.11517/jsaikbs.112.0_03 fatcat:q6ywykdvkfcgnfbwsn433lsbca