Filters








24,279 Hits in 4.3 sec

Hidden Conditional Random Fields for Meeting Segmentation

Stephan Reiter, Bjorn Schuller, Gerhard Rigoll
2007 Multimedia and Expo, 2007 IEEE International Conference on  
These Hidden Conditional Random Fields have been proven to be efficient in low level pattern recognition tasks.  ...  We therefore strive to improve on this task by applying conditional random fields augmented by hidden states.  ...  Therefore a generalized CRF with hidden state sequences is used, so called Hidden Conditional Random Fields (HCRF).  ... 
doi:10.1109/icme.2007.4284731 dblp:conf/icmcs/ReiterSR07 fatcat:hcil7rvfwfakfoska3chumfrmm

Latent-Dynamic Discriminative Models for Continuous Gesture Recognition

Louis-Philippe Morency, Ariadna Quattoni, Trevor Darrell
2007 2007 IEEE Conference on Computer Vision and Pattern Recognition  
Our results demonstrate that our model for visual gesture recognition outperform models based on Support Vector Machines, Hidden Markov Models, and Conditional Random Fields.  ...  Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence.  ...  In the speech and natural language processing community, Conditional Random Field(CRF) models have been used for tasks such as word recognition, partof-speech tagging, text segmentation and information  ... 
doi:10.1109/cvpr.2007.383299 dblp:conf/cvpr/MorencyQD07 fatcat:yx75dutjcvbfnbfaffmdhq4g3m

Sequence Labeling using Conditional Random Fields

Romansha Chopra, Nivedita Singh, Yang Zhenning, N.Ch.S.N. Iyengar
2017 International Journal of u- and e- Service, Science and Technology  
Conditional random fields (CRFs), is a scheme for building probabilistic models to divide and tag sequence data.  ...  A machine learning technique termed as Conditional Random Fields, which is designed for sequence labeling will be used in order to take advantage of the surrounding context.  ...  Tags Predicted Evaluation Measures Comparison of Conditional Random Fields with Hidden Markov Model Here we compare our method Conditional Random Fields with the traditional Hidden Markov Model (HMM  ... 
doi:10.14257/ijunesst.2017.10.9.10 fatcat:pofo2sdqazgfjgt3bvnsn5ibmi

A NOVEL TASK-ORIENTED APPROACH TOWARD AUTOMATED LIP-READING SYSTEM IMPLEMENTATION

D. Ivanko, D. Ryumin
2021 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Its main purpose is to be some kind of a roadmap for researchers who need to build a reliable visual speech recognition system for their task.  ...  Speech recognition using visual information is called lip-reading.  ...  ACKNOWLEDGEMENTS This research is financially supported by the Russian Foundation for Basic .  ... 
doi:10.5194/isprs-archives-xliv-2-w1-2021-85-2021 fatcat:fbs5odcon5e75boyhnldrquf2q

Phonetic recognition by recurrent neural networks working on audio and visual information

P. Cosi, M. Dugatto, F. Ferrero, E.Magno Caldognetto, K. Vagges
1996 Speech Communication  
Some results will be given for various speaker dependent and independent phonetic recognition experiments regarding the Italian plosive consonants. *  ...  The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a specialised hardware, called ELITE, computing lip and jaw kinematics parameters  ...  In the noisy case the speech signal was corrupted by a white noise with 0dB S/N ratio, which is a very hard condition for plosive recognition, even for a human listener.  ... 
doi:10.1016/0167-6393(96)00034-9 fatcat:twjse5vl3jf77iefvqre3rsike

Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition [chapter]

Guoyun Lv, Yangyu Fan, Dongmei Jiang, Rongchun Zhao
2008 Speech Recognition  
Its recognition basic units are phones, and can be used for large vocabulary audio-visual speech recognition. www.intechopen.com Multi-Stream Asynchrony Modeling for Audio Visual Speech Recognition 299  ...  Finally, we make simple comparisons for these audio-visual speech recognition models.  ...  Hou Yunshu and Sun Ali for providing some help for the audio-visual database and visual feature data.  ... 
doi:10.5772/6373 fatcat:vypqoaeh4nfxvgz5pe6h65prfa

Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

Guoyun Lv, Dongmei Jiang, Rongchun Zhao, Yunshu Hou
2007 Ninth IEEE International Symposium on Multimedia (ISM 2007)  
Its recognition basic units are phones, and can be used for large vocabulary audio-visual speech recognition.  ...  The training parameters are very tremendous, especially for the task of large vocabulary speech recognition.  ...  Hou Yunshu and Sun Ali for providing some help for the audio-visual database and visual feature data.  ... 
doi:10.1109/ism.2007.4412354 dblp:conf/ism/LvJZH07 fatcat:gfzs2q2rarhedbgpxqvvpnq4du

Sequential emotion recognition using Latent-Dynamic Conditional Neural Fields

Julien-Charles Levesque, Louis-Philippe Morency, Christian Gagne
2013 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)  
We evaluate the performance of this model on an audiovisual dataset of emotion recognition and compare it against other popular methods for sequence labeling.  ...  A new regularization term is proposed for the training of this model, encouraging diversity between hidden-states.  ...  [17] also used an extension of HCRFs, hidden conditional ordinal random fields (H CORF), for expression recognition in learned manifolds. III.  ... 
doi:10.1109/fg.2013.6553784 dblp:conf/fgr/LevesqueMG13 fatcat:gs7joc4kujetxhm6h2dfmevssa

Research on Modeling of Vocal State Duration Based on Spectrogram Analysis

Xiaoyan Zhang, M. Anpo, F. Song
2021 E3S Web of Conferences  
for the development of the application of vocal spectrum analysis technology in vocal music teaching.  ...  The experimental simulations show that the CNN fusion-based speaker recognition system achieves very good results in terms of recognition rate.  ...  Each RBM is a Markov random field with a two-layer structure, i.e., a visible layer and a hidden layer.  ... 
doi:10.1051/e3sconf/202123604043 fatcat:jp3cmhf6lzeddhywyhkdzmz3um

An improved gaussian mixture hidden conditional random fields model for audio-based emotions classification

Muhammad Hameed Siddiqi
2020 Egyptian Informatics Journal  
Therefore, this study investigates the improved version of a classifier that is based on hidden conditional random fields (HCRFs) model to classify emotional speech.  ...  The proposed model has been validated and evaluated on two publicly available datasets likes Berlin Database of Emotional Speech (Emo-DB) and the eNTER FACE'05 Audio-Visual Emotion dataset.  ...  To address the label biasness in MEMM, conditional random field [33] and hidden conditional random field (HCRF) [23, 25] were proposed.  ... 
doi:10.1016/j.eij.2020.03.001 fatcat:z2stdh5ptng5jajgzxoh6qiwg4

Prediction of Visual Backchannels in the Absence of Visual Context Using Mutual Influence [chapter]

Derya Ozkan, Louis-Philippe Morency
2013 Lecture Notes in Computer Science  
Based on the phenomena of mutual influence between participants of a face-to-face conversation, we propose a context-based prediction approach for modeling visual backchannels.  ...  In our proposed approach, we first anticipate the speaker behaviors, and then use this anticipated visual context to obtain more accurate listener backchannel moments.  ...  Fig. 3 . 3 Baseline Models: a) Conditional Random Fields (CRF), b) Latent Dynamic Conditional Random Fields(LDCRF), c) CRF Mixture of Experts (no latent variable) Table 1 . 1 Test performances of the  ... 
doi:10.1007/978-3-642-40415-3_17 fatcat:id3fle64grcbbkyoradpf44zs4

Video Based Person Authentication via Audio/Visual Association

Ming Liu, Thomas Huang
2006 2006 IEEE International Conference on Multimedia and Expo  
In this paper, Audio/Visual association, a lower level fusion, is proposed to fuse the information between lip movement and speech signal.  ...  However, there are detail structures between facial movement and speech signal.  ...  In audio/visual speech recognition, the model is trained to capture the the correlation between lip movement and speech signal for all speaker.  ... 
doi:10.1109/icme.2006.262448 dblp:conf/icmcs/LiuH06 fatcat:uys4ewy7wffnjjxzbzc63czupi

Survey on Neural Network Architectures with Deep Learning

Smys S., Joy Iong Zong Chen, Subarna Shakya
2020 Journal of Soft Computing Paradigm  
As deep learning has significant performance and advancements it is widely used in various applications like image classification, face recognition, visual recognition, language processing, speech recognition  ...  Model or Conditional random fields.  ...  Recently deep structures with conditional random fields are evolved, in which the output of each lower layer of random field is stacked with original input data which is on higher layer.  ... 
doi:10.36548/jscp.2020.3.007 fatcat:saxneqmonvdy7iv6qzkpkoii3y

A Review on Deep Learning Algorithms for Speech and Facial Emotion Recognition

Charlyn Pushpa Latha, Mohana Priya
2016 APTIKOM Journal on Computer Science and Information Technologies  
Deep Learning technique has obtained remarkable success in the field of face recognition with 97.5% accuracy. Facial Electromyogram (FEMG) signals are used to detect the different emotions of humans.  ...  This paper focuses on the review of some of the deep learning techniques used by various researchers which paved the way to improve the classification accuracy of the FEMG signals as well as the speech  ...  Table 2 i) Models relationships using a conditional random field (CRF), a powerful graphical model that is trained to predict the conditional probability for a sequence of labels. ii) The application  ... 
doi:10.11591/aptikom.j.csit.118 fatcat:gerpzx54qrgrtf3pqnnzgjywim

A Review on Deep Learning Algorithms for Speech and Facial Emotion Recognition

Charlyn Pushpa Latha, Mohana Priya
2020 APTIKOM Journal on Computer Science and Information Technologies  
Deep Learning technique has obtained remarkable success in the field of facerecognition with 97.5% accuracy. Facial Electromyogram (FEMG) signals are used to detect the different emotionsof humans.  ...  This paperfocuses on the review of some of the deep learning techniques used by various researchers which paved the way toimprove the classification accuracy of the FEMG signals as well as the speech signals  ...  Table 2 i) Models relationships using a conditional random field (CRF), a powerful graphical model that is trained to predict the conditional probability for a sequence of labels. ii) The application  ... 
doi:10.34306/csit.v1i3.55 fatcat:l2tska7j5ferna4wupt3f2jcp4
« Previous Showing results 1 — 15 out of 24,279 results