Filters








235 Hits in 5.9 sec

Learning state-dependent stream weights for multi-codebook HMM speech recognition systems

I. Rogina, A. Waibel
Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing  
FUTURE WORK So f ar w e h a v e o n l yp erf orm ed experi m ents w i t h stream s.  ...  G .W i l p o n , \Discrimi n a tive f e a ture selection f o r s p eech recogni t i o n ", Com puter Speech a n d L a n g u a g e, Vol . 7, pp. 2 2 9 -2 4 6 (1993) [ H u a 9 2 ] H u a n g , X . , A l l  ... 
doi:10.1109/icassp.1994.389316 dblp:conf/icassp/RoginaW94 fatcat:iuhyuhwdlbhijpykzcuwdjelzu

Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition

Zhihong Zeng, Yuxiao Hu, Ming Liu, Yun Fu, Thomas S. Huang
2006 Proceedings of the 14th annual ACM international conference on Multimedia - MULTIMEDIA '06  
Different from the weighting combination scheme, our approach is able to use a variety of learning methods to obtain a robust multi-stream fusion result.  ...  To simulate the human ability to assess affects, an automatic affect recognition system should make use of multi-sensor information.  ...  Performance comparison in clean audio condition among uni-stream HMM and multi-stream HMM, IHMM and MFHMM, and weighting and training combination schemes. 1 Correlation Error System among Combination rate  ... 
doi:10.1145/1180639.1180661 dblp:conf/mm/ZengHLFH06 fatcat:2gk5hgkjszht7eknl3twfx4kuy

Multi-stream Confidence Analysis for Audio-Visual Affect Recognition [chapter]

Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang
2005 Lecture Notes in Computer Science  
affect recognition.  ...  We investigate the use of individual modality confidence measures as a means of estimating weights when combining likelihoods in the audio-visual decision fusion.  ...  Lawrence Chen for collecting the valuable data in this paper for audio-visual affect recognition.  ... 
doi:10.1007/11573548_123 fatcat:se3zhvwbqrcbbega2qhudnk55q

Audio–Visual Affective Expression Recognition Through Multistream Fused HMM

Zhihong Zeng, Jilin Tu, Brian M. Pianfetti, Thomas S. Huang
2008 IEEE transactions on multimedia  
: information processed by computer system is limited to either face images or the speech signals.  ...  Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 4 cognitive states (interest, boredom, frustration and puzzlement) and 7 prototypical emotions  ...  Multi-stream Fused HMM (MFHMM) For integrating coupled audio and visual features, we propose multi-stream fused HMM (MFHMM) which constructs a new structure linking the multiple component HMMs which is  ... 
doi:10.1109/tmm.2008.921737 fatcat:5dvcmrvcinfhxcc4yfnkzakoti

Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework

Konstantin Markov, Jianwu Dang, Satoshi Nakamura
2006 Speech Communication  
Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system.  ...  In all experiments involving both speakerdependent and multi-speaker acoustic models, the HMM/BN system outperformed the baseline HMM system trained on acoustic data only.  ...  systems using mobile terminals''.  ... 
doi:10.1016/j.specom.2005.07.003 fatcat:fu4vrlyzmrfpxhr7aizqngi3ey

Discriminative speaker adaptation using articulatory features

Florian Metze
2007 Speech Communication  
The author would like to thank two anonymous reviewers for their input on an earlier draft of this paper.  ...  states of the HMM to be used for the multi-stream system.  ...  The problem of combining information from two (synchronous) sources using multi-stream HMMs has been studied in the context of audio-visual speech 9 recognition and multi-band speech recognition, mostly  ... 
doi:10.1016/j.specom.2007.02.009 fatcat:3tcd3bragja7jl4awlzloilpo4

Hybrid NN/HMM-Based Speech Recognition with a Discriminant Neural Feature Extraction

Daniel Willett, Gerhard Rigoll
1997 Neural Information Processing Systems  
In this paper, we present a novel hybrid architecture for continuous speech recognition systems.  ...  Experimental results show an relative error reduction of about 10% that we achieved on a remarkably good recognition system based on continuous HMMs for the Resource Management 1 OOO-word continuous speech  ...  MULTI STREAM SYSTEMS In HMM-based recognition systems the extracted features are often divided into streams that are modeled independently.  ... 
dblp:conf/nips/WillettR97 fatcat:kt6othuiyzf6jlm4d73c3pa4ku

Hybrid NN/HMM acoustic modeling techniques for distributed speech recognition

Jan Stadermann, Gerhard Rigoll
2006 Speech Communication  
Distributed speech recognition (DSR) where the recognizer is split up into two parts and connected via a transmission channel offers new perspectives for improving the speech recognition performance in  ...  Word-based HMMs and phoneme-based HMMs are trained for distributed and non-distributed recognition using either MFCC or RASTA-PLP features.  ...  General system architecture for distributed speech recognition As already mentioned, the need for distributed speech recognition (DSR) is evident.  ... 
doi:10.1016/j.specom.2006.01.007 fatcat:2n2rj7qc4bgntec4447kk6asy4

Statistical parametric speech synthesis with a novel codebook-based excitation model

Tamás Gábor Csapó, Géza Németh, Klára Vicsi, Anna Esposito
2014 International Journal of Intelligent Decision Technologies  
During the synthesis stage the codebook is searched for a suitable element in each voiced frame and these are concatenated to create the excitation signal, from which the final synthesized speech is created  ...  The decomposition is implemented by speech coders. We apply a novel codebook-based speech coding method to model the excitation of speech.  ...  Acknowledgements We would like to thank the listeners for participating in the subjective test. We thank the two anonymous reviewers for the helpful comments and suggestions.  ... 
doi:10.3233/idt-140197 fatcat:rr7tm5etobhhbldsvqwrnffvia

Voice Conversion [chapter]

Jani Nurminen, Hanna Siln, Victor Popa, Elina Helander, Moncef Gabbouj
2012 Speech Enhancement, Modeling and Recognition- Algorithms and Applications  
HMM modeling of speech HMM-based speech synthesis provides a flexible framework for speech synthesis, where all speech features can be modeled simultaneously within the same multi-stream HMM.  ...  Linguistic information has not traditionally been considered in the existing VC systems but is of high interest for example in the field of speech recognition.  ...  The chapters covers important fields in speech processing such as speech enhancement, noise cancellation, multi resolution spectral analysis, voice conversion, speech recognition and emotion recognition  ... 
doi:10.5772/37334 fatcat:2hgxvblj4rccvasfudopppuiau

Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey

Shankar T. Shivappa, Mohan Manubhai Trivedi, Bhaskar D. Rao
2010 Proceedings of the IEEE  
In this paper we describe the fusion strategies and the corresponding models used in audiovisual tasks such as speech recognition, tracking, biometrics, affective state recognition and meeting scene analysis  ...  intelligent systems.  ...  We sincerely thank the reviewers for their valuable advise which has helped us enhance the content as well as the presentation of the paper.  ... 
doi:10.1109/jproc.2010.2057231 fatcat:lfzgfmn2hjdq7h6o5txva3oapq

Deep sparse auto-encoder features learning for Arabic text recognition

Najoua Rahal, Maroua Tounsi, Amir Hussain, Adel M. Alimi
2021 IEEE Access  
We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition  ...  In this work, we introduce a new deep learning based system that recognizes Arabic text contained in images.  ...  The last step has been the recognition during which the HMM models were simultaneously decoded according to the multi-stream formalism.  ... 
doi:10.1109/access.2021.3053618 fatcat:p7jhbokjsjbunceuq4lu7xnmci

Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition

M. Kubanek, J. Bobulski, L. Adrjanowicz
2012 Bulletin of the Polish Academy of Sciences: Technical Sciences  
This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal.  ...  A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate  ...  The following steps describe the Viterbi algorithm for the two stream coupled HMM used in our audio-visual system.  ... 
doi:10.2478/v10175-012-0041-6 fatcat:xk45sxtkq5dppdbdifs3pavz4q

An analysis-by-synthesis approach to vocal tract modeling for robust speech recognition

Ziad Al Bawab
2012 Qatar Foundation Annual Research Forum Proceedings  
I enjoyed learning from his wisdom and experience in life as much as I enjoyed learning and deeply understanding the basic issues related to signal processing and speech recognition from him.  ...  Together they are an encyclopedia on ideas related to speech recognition and have contributed to this field for more than a decade now.  ...  A Knowledge-Based Approach to the Speech Recognition Problem State-of-the-art speech recognition systems use Hidden Markov Models (HMMs) which are composed of states and observations.  ... 
doi:10.5339/qfarf.2012.aesnp6 fatcat:awqcncfewvaytnkfstmvehwcvm

A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams

Martin Wöllmer, Marc Al-Hames, Florian Eyben, Björn Schuller, Gerhard Rigoll
2009 Neurocomputing  
Optimally exploiting mutual information during decoding even if the input streams are not synchronous, our algorithm outperforms late and early fusion techniques in a challenging bimodal speech and gesture  ...  To overcome the computational complexity of the asynchronous hidden Markov model (AHMM), we present a novel multidimensional dynamic time warping (DTW) algorithm for hybrid fusion of asynchronous data.  ...  Examples for multimodal systems causing higher robustness are the combination of speech and gestures or the fusion of speech recognition and lip-reading: by using both modalities the speech recognition  ... 
doi:10.1016/j.neucom.2009.08.005 fatcat:zwlxz67dzfdqfjmnikvud2bstm
« Previous Showing results 1 — 15 out of 235 results