Data quality matters: Iterative corrections on a corpus of Mendelssohn string quartets and implications for MIR analysis

Jacob DeGroot-Maggetti, Timothy R De Reuse, Laurent Feisthauer, Samuel Howes, Yaolong Ju, Suzuka Kokubu, Sylvain Margot, Néstor Nápoles López, Finn Upham
2020 Zenodo  
In this paper, we describe a workflow of successive corrections on Optical Music Recognition (OMR) generated MusicXML files and their respective outputs under Music Information Retrieval tasks. The original OMR-generated files of six Mendelssohn String Quartets were initially corrected by individual members of this interdisciplinary group, then reviewed by others to further standardize the quality and music analysis priorities of the team. Four MIR tasks are applied to each round of corrections
more » ... on this collection: cadence detection, chord labeling, key finding, and monophonic pattern discovery.We measure changes in the outputs of these four MIR tasks from one round of correction to the next in order to evaluate the impact of corrections. Results show that expert revision is more beneficial to some MIR tasks than to others. The resulting corpus of curated MusicXML files is available as an open-source repository under a Creative Commons Attribution 4.0 International License for further MIR research.
doi:10.5281/zenodo.4245459 fatcat:dkoe2moesfgdrgydcrwpig666a