Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming

Changkyu Choi, Donggeon Kong, Hyoung-Ki Lee, Sang Min Yoon
2004 Interspeech 2004   unpublished
Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omnidirectional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of
more » ... ing wide-band sound sources and pictures of human in a reverberant environment the proposed system effectively separates multiple target speeches.
doi:10.21437/interspeech.2004-681 fatcat:nhj5mp4njndyxkhpoxinjbomeq