A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

Cyril Joder, Slim Essid, Gaël Richard
<span title="">2011</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zcz4ey2iwffxtgaodtf5jtebmy" style="color: black;">IEEE Transactions on Audio, Speech, and Language Processing</a> </i> &nbsp;
In the present work, we introduce the use of Conditional Random Fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames. Three different CRF models are proposed for our task, for different choices of tradeoff between accuracy and complexity. Three types of features are used,
more &raquo; ... haracterizing the local harmony, note attacks and tempo. We also propose a novel hierarchical approach, which takes advantage of the score structure for an approximate decoding of the statistical model. This strategy reduces the complexity, yielding a better overall efficiency than the classic beam search method used in HMM-based models. Experiments run on a large database of classical piano and popular music exhibit very accurate alignments. Indeed, with the best performing system, more than 95% of the note onsets are detected with a precision finer than 100 ms. We additionally show how the proposed framework can be modified in order to be robust to possible structural differences between the score and the musical performance.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tasl.2011.2134092">doi:10.1109/tasl.2011.2134092</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2h5x5adnbzhahbc72so2adjf2m">fatcat:2h5x5adnbzhahbc72so2adjf2m</a> </span>
