A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
On the improvement of multimodal voice activity detection
As mobile devices, intelligent displays, and home entertainment systems permeate digital markets, the desire for users to interact through spoken and visual modalities similarly grows. Previous interactive systems limit voice activity detection (VAD) to the acoustic domain alone, but the incorporation of visual features has shown great improvement in performance accuracy. When employing both acoustic and visual (AV) information the central recurring question becomes "how does one efficientlydoi:10.21437/interspeech.2013-194 fatcat:oahyobckbvbq7bazh7fnfd4kzm