MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network For Sound Event Detection

Keming Zhang, Yuanwen Cai, Yuan Ren, Ruida Ye, Liang He
2020 IEEE Access  
To reduce neural network parameter counts and improve sound event detection performance, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. Our goal is to improve sound event detection performance and recognize target sound events with variable duration and different audio backgrounds with low parameter counts. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift-invariant features from the
more » ... time and frequency domains of acoustic samples. A two-layer bidirectional gated recurrent unit is used to capture the temporal context from the extracted high-level features. The proposed method is evaluated on two different sound event datasets. Compared to that of the baseline method and other methods, the performance is greatly improved as a single model with low parameter counts without pretraining. On the TUT Rare Sound Events 2017 evaluation dataset, our method achieved an error rate (ER) of 0.09±0.01, which was an improvement of 83% compared with the baseline. On the TAU Spatial Sound Events 2019 evaluation dataset, our system achieved an ER of 0.11±0.01, a relative improvement over the baseline of 61%, and F1 and ER values that are better than those of the development dataset. Compared to the state-of-the-art methods, our proposed network achieves competitive detection performance with only one-fifth of the network parameter counts. INDEX TERMS Pattern recognition, sound event detection, multiscale learning, time-frequency transform, convolutional recurrent neural network.
doi:10.1109/access.2020.3015047 fatcat:hzox2myax5gapo4so5wthrcd4q