A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Voxel-based Viterbi Active Speaker Tracking (V-VAST) with best view selection for video lecture post-production
2011
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker which maximizes the observed speech activity. This novel
doi:10.1109/icassp.2011.5946941
dblp:conf/icassp/KellyKB11
fatcat:4hkechvwm5cfdl2xvn3crkohla