A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is application/pdf
.
Filters
Audio-Visual Clustering for 3D Speaker Localization
[chapter]
Lecture Notes in Computer Science
We show that the identification and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. ...
A microphone array can provide an estimate 3D location of each audio source. ...
Research Group (Department of Computer Science, University of Sheffield) for helpful discussions and comments. ...
doi:10.1007/978-3-540-85853-9_8
fatcat:n4xa6watsve3zjg2yah23rq4dm
Vision-guided robot hearing
2014
The international journal of robotics research
In this context, the detection and localisation of speakers plays a key role since it is the pillar on which several tasks (e.g.: speech recognition and speaker tracking) rely. ...
Indeed, the deterministic component allows us to map the visual information into the auditory space. ...
The 3D visual features are mapped into the auditory space A through the audio-visual mapping (A • V −1 ). ...
doi:10.1177/0278364914548050
fatcat:onjyr7y2jzfhxiytcgjoaeicei
Multimodal Speaker Diarization Utilizing Face Clustering Information
[chapter]
2015
Lecture Notes in Computer Science
In this paper, we use visual information to aid speaker clustering. ...
Multimodal clustering/diarization tries to answer the question "who spoke when" by using audio and visual information. ...
The European Union is not liable for any use that may be made of the information contained therein. ...
doi:10.1007/978-3-319-21963-9_50
fatcat:hrl66z7gdncapa7qaqykvq52oa
Detection and localization of 3d audio-visual objects using unsupervised clustering
2008
Proceedings of the 10th international conference on Multimodal interfaces - IMCI '08
It is shown that the detection and localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. ...
This model maps the data into a common audio-visual 3D representation via a pair of mixture models. ...
We use this in particular to determine active speakers using the auditory observations assignments η k 's. For every person we can derive the speaking state by the number of associated observations. ...
doi:10.1145/1452392.1452438
dblp:conf/icmi/KhalidovFHAH08
fatcat:rkdyghmti5efpjf5ahclil5hge
Motion Features from Lip Movement for Person Authentication
2006
18th International Conference on Pattern Recognition (ICPR'06)
This paper describes a new motion based feature extraction technique for speaker identification using orientation estimation in 2D manifolds. ...
By projecting the 3D spatiotemporal data to 2-D planes we obtain projection coefficients which we use to evaluate the 3-D orientations of brightness patterns in TV like image sequences. ...
Speaker verification based on audio and visual images from lip-movement give 98% correct classification which is 3-4% better than audio based speaker verification. ...
doi:10.1109/icpr.2006.814
dblp:conf/icpr/FarajB06
fatcat:ytjcun2rrbcsdpvrvwfkyn3s3q
Finding audio-visual events in informal social gatherings
2011
Proceedings of the 13th international conference on multimodal interfaces - ICMI '11
To this end, we fully exploit the geometric and physical properties of an audio-visual sensor based on binocular vision and binaural hearing. ...
We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to supervise the clustering process. ...
visual scene, or a speaker is occluded by another speaker/sound source. ...
doi:10.1145/2070481.2070527
dblp:conf/icmi/Alameda-PinedaKHF11
fatcat:m746sm43mrbyzkkjalimhys2g4
Dialocalization
2010
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
The following article presents a novel audio-visual approach for unsupervised speaker localization in both time and space and systematically analyzes its unique properties. ...
The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audio-only) speaker diarization, but also adds visual speaker localization at little ...
ACKNOWLEDGMENTS We thank Adam Janin and Mary Knox for very helpful input on this article and Bao-Lan Huynh for the baseline experiments with the OpenCV face detector. ...
doi:10.1145/1865106.1865111
fatcat:tilcqsv3sjgcxolvngnhnhi6oa
Audio–visual person authentication using lip-motion from orientation maps
2007
Pattern Recognition Letters
The XM2VTS database was used for performance quantification as it is currently the largest publicly available database (%300 persons) containing both lip-motion and speech. ...
Since the velocities are computed without extracting the speaker's lip-contours, more robust visual features can be obtained in comparison to motion features extracted from lip-contours. ...
Fig. 6 . 6 The suggested joint audio-visual speaker verification system. ...
doi:10.1016/j.patrec.2007.02.017
fatcat:bpuqxo57mzbqphqybzdbv3k4tu
Deep Audio-visual Learning: A Survey
2021
International Journal of Automation and Computing
We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual ...
In this paper, we provide a comprehensive survey of recent audio-visual learning development. ...
For the audio stream, the researchers applied a neural network model to detect speech for clustering and subsequently assigned a frame cluster to the given audio cluster according to the majority principle ...
doi:10.1007/s11633-021-1293-0
fatcat:an5lfyf4m5fh7mlngmdcbx7joy
Audio Segmentation and Speaker Localization in Meeting Videos
2006
18th International Conference on Pattern Recognition (ICPR'06)
We compare our results with audio based segmentation method and our localization technique with the commonly used mutual information. ...
In this effort, given a meeting room video, we attempt to segment individual person's speech and localize them in the video, based on data from a single audio and video source. ...
Localization Once clusters for individual speakers are obtained, the next step is to localize the speaker in the corresponding video frames. ...
doi:10.1109/icpr.2006.283
dblp:conf/icpr/VajariaISSK06
fatcat:ubjxzo2ao5ev3on47ltxtw7hty
Cyberspatial audio technology
1999
Journal of the Acoustical Society of Japan (E)
for such speaker array systems assume only rough speaker-placement guidelines. ...
The red translucent cones visualize localization errors used by a clustering algorithm17) to decide which sources can be coalesced. ments like nuclear power plants, fires, toxic waste dumps, and deep mining ...
Besides the interest in spatial audio manifested by this paper, Cohen has research interests in telecommunication semiotics and hypermedia; Herder has interests in computer graphics, software engineering ...
doi:10.1250/ast.20.389
fatcat:37wpewb45jgl3a3xlr6tfn47ae
Speaker Detection and Applications to Cross-Modal Analysis of Planning Meetings
2009
2009 11th IEEE International Symposium on Multimedia
In this paper, we present an approach of speaker localization using combination of visual and audio information in multimodal meeting analysis. ...
By computing correlation of audio signals, mouth movements, and hand motion, we detect a talking person both spatially and temporally. Three kinds of features are extracted for speaker localization. ...
In this paper, we present our visual audio-based techniques to perform speaker localization in our meeting room. ...
doi:10.1109/ism.2009.66
dblp:conf/ism/FangXQ09
fatcat:dhk4spssorf2xcbdq7gbosxhzy
Blind Audiovisual Source Separation Based on Sparse Redundant Representations
2010
IEEE transactions on multimedia
Results show that the proposed method is able to successfully detect, localize, separate and reconstruct present audio-visual sources. ...
Based on this co-occurrence measure, audio-visual sources are counted and located in the image using a robust clustering algorithm that groups video structures exhibiting strong correlations with the audio ...
Video atoms synchronous with the audio track and that are spatially close are grouped together using a clustering algorithm that counts and localizes on the image plane audio-visual sources. ...
doi:10.1109/tmm.2010.2050650
fatcat:rusd73kyvjfbre6lgup4i366xe
Deep Audio-Visual Learning: A Survey
[article]
2020
arXiv
pre-print
We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual ...
Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. ...
For the audio stream, the researchers applied a neural network model to detect speech for clustering and subsequently assigned a frame cluster to the given audio cluster according to the majority principle ...
arXiv:2001.04758v1
fatcat:p6ph5cujl5do3pzlpvcce35nvi
AVA-AVD: Audio-visual Speaker Diarization in the Wild
[article]
2021
arXiv
pre-print
Audio-visual speaker diarization aims at detecting "who spoken when" using both auditory and visual signals. ...
To overcome it, we propose a novel Audio-Visual Relation Network (AVR-Net) which introduces an effective modality mask to capture discriminative information based on visibility. ...
Ava active speaker: An audio-visual
modal speaker clustering in full length movies. Multimedia dataset for active speaker detection. ...
arXiv:2111.14448v3
fatcat:b6ayj24h4jb4hn5t2h5tsghk4e
« Previous
Showing results 1 — 15 out of 3,595 results