Score Following as a Multi-Modal Reinforcement Learning Problem

Florian Henkel, Stefan Balke, Matthias Dorfer, Gerhard Widmer
2019 Transactions of the International Society for Music Information Retrieval  
Score following is the process of tracking a musical performance (audio) in a corresponding symbolic representation (score). While methods using computer-readable score representations as input are able to achieve reliable tracking results, there is little research on score following based on raw score images. In this paper, we build on previous work that formulates the score following task as a multi-modal Markov Decision Process (MDP). Given this formal definition, one can address the problem
more » ... address the problem of score following with state-of-the-art deep reinforcement learning (RL) algorithms. In particular, we design end-to-end multi-modal RL agents that simultaneously learn to listen to music recordings, read the scores from images of sheet music, and follow the music along in the sheet. Using algorithms such as synchronous Advantage Actor Critic (A2C) and Proximal Policy Optimization (PPO), we reproduce and further improve existing results. We also present first experiments indicating that this approach can be extended to track real piano recordings of human performances. These audio recordings are made openly available to the research community, along with precise note-level alignment ground truth. Henkel et al: Score Following as a Multi-Modal Reinforcement Learning Problem 68 The specific contributions of the present work are as follows: Audio (Spectrogram) 78 × 40 Sheet-Image 80 × 256 Conv (3, stride-1)-16 Conv (5, stride-(1, 2))-16 Conv ( 3, stride-1)-16 Conv (3, stride-1)-16 Conv (3, stride-2)-32 Conv (3, stride-2)-32 Conv (3, stride-1)-32 + DO (0.2) Conv (3, stride-1)-32 + DO (0.2) Conv (3, stride-2)-64 Conv (3, stride-2)-32 Conv (3, stride-2)-96 Conv (3, stride-2)-64 + DO (0.2) Conv (1, stride-1)-96 + DO (0.2) Conv (3, stride-2)-96 Dense (512) Conv (1, stride-1)-96 + DO (0.2) Dense (512) Concatenation + Dense (512) Dense (256) + DO (0.2) Dense (256) + DO (0.2) Dense (3) -Softmax Dense (1) -Linear
doi:10.5334/tismir.31 fatcat:ue7vzfmjjbfzxgg3vixg3bjdu4