Singing Voice Melody Estimation From Polyphonic Signals

Logan Stillings, Pritish Chandna
2021 Zenodo  
Singing voice melody estimation is the task of calculating the fundamental frequency (f0) of the predominant voice in a piece of music containing multiple instruments. In this work, I evaluate the performance of several popular f0 estimation algorithms us-ing an annotated dataset (MedleyDB) of raw monophonic tracks, polyphonic mixes, and source-separated vocals. Many of the models were created to estimate the predominant melody and not necessarily the sung vocal melody, for example they could
more » ... capable of estimating the melody in instrumental music. Of the models tested, CREPE performs highly as a monophonic model, and Deep Salience and Encoder/Decoder perform highly as polyphonic models. By implementing source-separation as a preprocessing step, monophonic models such as CREPE, SPICE, and become viable options for the task of vocal melody estimation. These mono-phonic algorithms each perform signi˝cantly better in pitch accuracy on the source-separated vocal tracks compared to the polyphonic mixes. Additionally, each of the polyphonic algorithms tested increased in overall accuracy when using the source-separated tracks instead of the polyphonic mixes. I suggest further research im-plementing source-separation as a preprocessing step to vocal melody estimation using other source-separation tools and di˙erent datasets. Potential datasets could include tracks with overlapping vocal harmonies as well as di˙erent musical styles such as metal, rap, or non-western music.
doi:10.5281/zenodo.5554723 fatcat:wrmegfvfdba5jfp7kilt6dg2by