Vocal Pitch Extraction in Polyphonic Music Using Convolutional Residual Network

Mingye Dong, Jie Wu, Jian Luan
2019 Interspeech 2019  
Pitch extraction, also known as fundamental frequency estimation, is a long-term task in audio signal processing. Especially, due to the presence of accompaniment, vocal pitch extraction in polyphonic music is more challenging. So far, most of deep learning approaches use log mel spectrogram as input, which neglect the phase information. In addition, shallow networks have been applied on waveform directly, which may not handle contaminated vocal data well. In this paper, a deep convolutional
more » ... idual network is proposed. It analyzes and extracts effective feature from waveform automatically. Residual learning can reduce model degradation due to the skip connection and residual mapping. In comparison to reported results, the proposed approach shows 5% and 4% improvement on overall accuracy(OA) and raw pitch accuracy(RPA) respectively.
doi:10.21437/interspeech.2019-2286 dblp:conf/interspeech/DongWL19 fatcat:jdda2ksquben3oyi4omqgf4f7m