A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers
2019
IEEE Transactions on Pattern Analysis and Machine Intelligence
In this article, we address the problem of tracking multiple speakers via the fusion of visual and auditory information. We propose to exploit the complementary nature and roles of these two modalities in order to accurately estimate smooth trajectories of the tracked persons, to deal with the partial or total absence of one of the modalities over short periods of time, and to estimate the acoustic status-either speaking or silent-of each tracked person over time. We propose to cast the problem
doi:10.1109/tpami.2019.2953020
pmid:31751223
fatcat:vghfsjsecbahbfn3d3lps3f3rm