Classification accuracy is not enough

Bob L. Sturm
2013 Journal of Intelligent Information Systems  
We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82 %) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision,
more » ... confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems if genre recognition (rather than identification by irrelevant confounding factors) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.
doi:10.1007/s10844-013-0250-y fatcat:kwoquuwuajhrnp7btbth2ke264