Semi-supervised non-negative tensor factorisation of modulation spectrograms for monaural speech separation

Tom Barker, Tuomas Virtanen
2014 2014 International Joint Conference on Neural Networks (IJCNN)  
This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulation spectrogram. Harmonically related components tend to modulate in a similar fashion, and this redundancy of patterns can be isolated. This feature representation requires fewer parameters than spectrally
more » ... sed methods and so minimises overfitting. Following the tensor factorisation, the separated signals are reconstructed by learning appropriate Wiener-filter spectral parameters which have been constrained by activation parameters learned in the first stage. Strong results were obtained for two-speaker mixtures where source separation performance exceeded those used as benchmarks. Specifically, the proposed semi-supervised method outperformed both semi-supervised non-negative matrix factorisation and blind non-negative modulation spectrum tensor factorisation.
doi:10.1109/ijcnn.2014.6889522 dblp:conf/ijcnn/BarkerV14 fatcat:dckn45sdh5d7thpjtdjlrzsdui