A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming
2004
Interspeech 2004
unpublished
Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omnidirectional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of
doi:10.21437/interspeech.2004-681
fatcat:nhj5mp4njndyxkhpoxinjbomeq