Filters








1,129 Hits in 4.6 sec

Multi Channel Sequence Processing [chapter]

Samy Bengio, Hervé Bourlard
2005 Lecture Notes in Computer Science  
As briefly reported here, combination of these two approaches yielded successful results on several multi-channel tasks, ranging from audio-visual speech recognition to automatic meeting analysis.  ...  In this framework, we discuss here two novel approaches, which recently started to be investigated with success in the context of large multimodal problems.  ...  However, as a matter of fact, there are multiple evidences of real life applications involving several asynchronous streams. For instance, audio-visual speech recognition usually exhibits asynchrony.  ... 
doi:10.1007/11559887_2 fatcat:rfcpfjq5u5cyblmgslzywxf3rm

AN ASYNCHRONOUS DBN FOR AUDIO-VISUAL SPEECH RECOGNITION

Kate Saenko, Karen Livescu
2006 2006 IEEE Spoken Language Technology Workshop  
This type of asynchrony has been previously used for pronunciation modeling and for visual speech recognition (lipreading); however, this is its first application to audiovisual speech recognition.  ...  We investigate an asynchronous two-stream dynamic Bayesian network-based model for audio-visual speech recognition.  ...  This type of asynchrony has been previously used for pronunciation modeling [8, 9] and for lipreading [15] ; however, this is its first application to audio-visual speech recognition.  ... 
doi:10.1109/slt.2006.326841 dblp:conf/slt/SaenkoL06 fatcat:773coclikvdltb2dduppsnznwa

Speech selection and environmental adaptation for asynchronous speech recognition

Bo Ren, Longbiao Wang, Atsuhiko Kai, Zhaofeng Zhang
2015 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)  
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording.  ...  Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distanttalking speech recognition with asynchronous mobile terminals.  ...  Fig. 1 . 1 System diagram of speech recognition with asynchronous speech recording accuracy for speech recognition.  ... 
doi:10.1109/apsipa.2015.7415485 dblp:conf/apsipa/RenWKZ15 fatcat:g42dwxr2dzgwhctpkfnpvbsh7e

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Samy Bengio
2002 Neural Information Processing Systems  
The model has been tested on an audio-visual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions.  ...  This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same event.  ...  The author would like to thank Stephane Dupont for providing the extracted visual features and the experimental protocol used in the paper.  ... 
dblp:conf/nips/Bengio02 fatcat:ws4mcnmvhvazbmyriz4wtvnt7i

State Synchronous Modeling on Phone Boundary for Audio Visual Speech Recognition and Application to Muti-View Face Images

Kenichi Kumatani, Rainer Stiefelhagen
2007 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07  
Index Terms-Audio visual automatic speech recognition, visual information, product HMM, multi-view.  ...  Visual speech cues are known to improve the performance of automatic speech recognition (ASR). However, many researchers have used speaker's frontal pose mainly.  ...  We also showed that recognition accuracy was improved by retraining the product HMM with the audio visual features [5] .  ... 
doi:10.1109/icassp.2007.366938 dblp:conf/icassp/KumataniS07 fatcat:6w2kld32djdkxh3upgoepq3tmq

Overcoming asynchrony in Audio-Visual Speech Recognition

Virginia Estellers, Jean-Philippe Thiran
2010 2010 IEEE International Workshop on Multimedia Signal Processing  
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition.  ...  On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.  ...  The statistical models commonly used in Audio-Visual Speech Recognition (AVSR) are multistream Hidden Markov Models, the natural extension of the Hidden Markov Models (HMM) used in audio speech recognition  ... 
doi:10.1109/mmsp.2010.5662066 dblp:conf/mmsp/EstellersT10 fatcat:xyssnbvl65fcnndgwon3kw4pli

Audio Visual Speech Recognition and Segmentation Based on DBN Models [chapter]

Dongmei Jiang, Guoyun Lv, Ilse Ravyse, Xiaoyue Jiang, Yanning Zhang, Hichem Sahli, Rongchun Zhao
2007 Robust Speech Recognition and Understanding  
states are assigned to one word to assemble the HMM of the word, was designed for small vocabulary speech recognition.  ...  One can notice that the SDBN model, either with MFCC features or with PLP features, gives phone segmentation results very close to those of the triphone HMMs, the standard continuous speech recognition  ...  Audio Visual Speech Recognition and Segmentation Based on DBN Models, Robust Speech Recognition and Understanding, Michael Grimm and Kristian Kroschel (Ed.), ISBN: 978-3-902613-08-0, InTech, Available  ... 
doi:10.5772/4748 fatcat:kkpzxvhljbgpho5e72enkvpiiy

Modeling individual and group actions in meetings with layered HMMs

Dong Zhang, D. Gatica-Perez, S. Bengio, I. McCowan
2006 IEEE transactions on multimedia  
the best two-layer method, i.e., AV asynchronous HMM with soft-decision.  ...  to G-HMM for group action recognition.  ... 
doi:10.1109/tmm.2006.870735 fatcat:xykq3hpnsvfjfp7eqo23jdx4ee

Multimodal speech processing using asynchronous Hidden Markov Models

Samy Bengio
2004 Information Fusion  
The model was tested on two audio-visual speech processing tasks, namely speech recognition and text-dependent speaker verification, both using the M2VTS database.  ...  We thus present a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same sequence of events.  ...  It is also very similar to the asynchronous version of Input/Output HMMs [9] , which was proposed for speech recognition applications.  ... 
doi:10.1016/j.inffus.2003.04.001 fatcat:tzddmmp5s5d5pbmmghqy2xrzey

Adaptive Multimodal Fusion by Uncertainty Compensation with Application to Audio-Visual Speech Recognition [chapter]

George Papandreou, Athanassios Katsamanis, Athanassios Katsamanis, Vassilis Pitsikalis, Petros Maragos
2008 Multimodal Processing and Interaction  
Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures.  ...  We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.  ...  Murphy for making his HMM toolkit publicly available, and J. N. Gowdy for providing the CUAVE database.  ... 
doi:10.1007/978-0-387-76316-3_4 fatcat:l2odqrkgxvaerkc4h2qnry7lyq

Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition

S. Lucey, T. Chen, S. Sridharan, V. Chandran
2005 IEEE transactions on multimedia  
In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual speech modalities with respect to two major questions.  ...  Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech.  ...  ACKNOWLEDGEMENTS The authors would like to thank the M2VTS Project for use of their database. Part of our work is supported by an Intel research grant.  ... 
doi:10.1109/tmm.2005.846777 fatcat:j56km2zh2bfj5lbmmnggezboge

Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

Longbiao Wang, Bo Ren, Yuma Ueda, Atsuhiko Kai, Shunta Teraoka, Taku Fukushima
2014 Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific  
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording.  ...  Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals.  ...  System diagram of speech recognition with asynchronous speech recording. 1 Fig. 2 . 12 Fig. 2. Topology of stacked denoising autoencoder for cepstral-domain dereverberation.  ... 
doi:10.1109/apsipa.2014.7041548 dblp:conf/apsipa/WangRUKTF14 fatcat:intfbj62k5dcnkuicthw33gkhm

Product-HMMs for automatic sign language recognition

Stavros Theodorakis, Athanassios Katsamanis, Petros Maragos
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
Fusing movement and shape information with the PHMMs has increased sign classification performance by 1,2% in comparison to the Parallel HMM fusion model.  ...  Alternative approaches are investigated and the application of Product-HMMs (PHMM) is proposed.  ...  Fotinea from the Institute for Language and Speech Processing for providing the Greek Sign Language database.  ... 
doi:10.1109/icassp.2009.4959905 dblp:conf/icassp/TheodorakisKM09 fatcat:hhrvvb2cq5dfnlcrbrxuhushfq

A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Martin Wöllmer, Marc Al-Hames, Florian Eyben, Björn Schuller, Gerhard Rigoll
2009 Neurocomputing  
To overcome the computational complexity of the asynchronous hidden Markov model (AHMM), we present a novel multidimensional dynamic time warping (DTW) algorithm for hybrid fusion of asynchronous data.  ...  Optimally exploiting mutual information during decoding even if the input streams are not synchronous, our algorithm outperforms late and early fusion techniques in a challenging bimodal speech and gesture  ...  Acknowledgments The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 211486 (SEMAINE).  ... 
doi:10.1016/j.neucom.2009.08.005 fatcat:zwlxz67dzfdqfjmnikvud2bstm

Acoustic adaptation to dynamic background conditions with asynchronous transformations

Oscar Saz, Thomas Hain
2017 Computer Speech and Language  
This paper proposes a framework for performing adaptation to complex and non-stationary background conditions in Automatic Speech Recognition (ASR) by means of asynchronous Constrained Maximum Likelihood  ...  The implementation is done with a new Hidden Markov Model (HMM) topology that expands the usual left-to-right HMM into parallel branches adapted to different background conditions and permits transitions  ...  Acknowledgement This work was supported by the EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).  ... 
doi:10.1016/j.csl.2016.06.008 fatcat:e5zn4umrgzhylelkwtpjtyykge
« Previous Showing results 1 — 15 out of 1,129 results