Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation

Alexey Ozerov, Cedric Fevotte, Raphael Blouet, Jean-Louis Durrieu
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through
more » ... multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications. Index Terms-Audio source separation, user-guided, nonnegative tensor factorization, generalized expectation maximization. 978-1-4577-0539-7/11/$26.00
doi:10.1109/icassp.2011.5946389 dblp:conf/icassp/OzerovFBD11 fatcat:2v7pkdybmvganhwy2k5apazupa