Singing Phoneme Class Detection In Polyphonic Music Recordings

Ourania Vagia, Perfecto Herrera
2008 Zenodo  
Automatic singing detection and singing phoneme recognition are two MIR research topics that have gained a lot of attention the last years. The first approaches borrowed successful techniques widely used in Automatic Speech Recognition (ASR) as speech and singing share similar acoustical features since they are produced by the same apparatus. Moving from monophonic to polyphonic audio signals the problem becomes more complex as the background instrumental accompaniment is regarded as a noise
more » ... rce that has to be attenuated. This thesis presents research into the problem of singing phoneme detection in polyphonic audio, in which the lyrics are in English. Specifically, we are interested in building statistical classification models that are able to automatically distinguish sung consonants and vowels from pure instrumental music in polyphonic music recordings. The approach begins with a database creation to be used for training, testing and evaluating the models. Several sets of extracted low-level features are used in the classification process. Different classification functions are compared like SVM, MLP and logistic as well as different classification schemes (3-class classifiers, binary classifiers in series and in parallel). The best classification model found reaches an overall accuracy of 78% in distinguishing between the 3 different classes.
doi:10.5281/zenodo.3744709 fatcat:bxl33cpakfevbeirbnq46xhtqe