Language and genre detection in audio content analysis

Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson
2008 Interspeech 2008   unpublished
This paper presents an audio genre detection framework that can be used for a multi-language audio corpus. Cepstral coefficients are considered and analyzed as the feature set for both a language dependent and language independent genre identification (GID) task. Language information is found to increase the overall detection accuracy on an average by at least 2.6% from its language independent counterpart. Melfrequency cepstral coefficients have been widely used for Music Information Retrieval
more » ... (MIR), however, the present study shows that Linear-frequency cepstral coefficients (LFCC) with a higher number of frequency bands can improve the detection accuracy. Two other GID architectures have also been considered, but the results show that the logenergy amplitudes from triangular linearly spaced filter banks and their deltas can offer average detection accuracy as high as 98.2%, when language information is taken into account.
doi:10.21437/interspeech.2008-621 fatcat:t73vxm4j2zf2jbl2ziqvsontgq