Filters








3,870 Hits in 5.7 sec

Large-Vocabulary Continuous Sign Language Recognition Based on Transition-Movement Models

Gaolin Fang, Wen Gao, Debin Zhao
2007 IEEE transactions on systems, man and cybernetics. Part A. Systems and humans  
Index Terms-Chinese sign language (CSL), dynamic time warping (DTW), hidden Markov model (HMM), sign language recognition (SLR), temporal clustering algorithm.  ...  For tackling mass transition movements arisen from a large vocabulary size, a temporal clustering algorithm improved from k-means by using dynamic time warping as its distance measure is proposed to dynamically  ...  SLR is used to deal with the recognition problem of the temporal pattern of multiple data streams. From the point of view of handling a time-series signal, SLR is very similar to speech recognition.  ... 
doi:10.1109/tsmca.2006.886347 fatcat:2riou3fzonetdlnigj6asvv6o4

Activity in regions sensitive to auditory speech is modified during speech production: fMRI evidence for an efference copy

Zhuo Zheng, Ingrid Johnsrude, Kevin Munhall
2007 Journal of the Acoustical Society of America  
Activity in the superior temporal gyrus bilaterally was significantly greater for clear than masked speech during the listen-only trials (F(1,20)≥12.84, p<0.002), and significantly higher for masked than  ...  for clear speech in the production trials (F(1,20)≥6.68, p<0.02).  ...  copy of vocal motor commands during speech production.  ... 
doi:10.1121/1.4781746 fatcat:pn3sykl7hfhcjchi272tqykiga

Automatic Detection Technique for Speech Recognition based on Neural Networks Inter-Disciplinary

Mohamad A., Abdusamad Al-Marghilani, Akram Aref
2018 International Journal of Advanced Computer Science and Applications  
The latest acoustic modeling methods provide deep neural networks for speech recognition.  ...  Automatic speech recognition allows the machine to understand and process information provided orally by a human user.  ...  We worked on a task of segmentation in speakers in the audio streams of television programs collected for the LNE Audiovisual Emissions Recognition evaluation campaign (REPERE).  ... 
doi:10.14569/ijacsa.2018.090326 fatcat:uqxbzpitingxrneflg7i2bvfjq

Speech recognition by indexing and sequencing

Simone Franzini, Jezekiel Ben-Arie
2010 2010 International Conference of Soft Computing and Pattern Recognition  
This can be performed on isolated events or on a continuous data stream. When applied to speech, this translates to isolated-word speech recognition and continuous speech recognition.  ...  In Chapter 2 we explain how DTW and HMMs work, both as general temporal pattern recognition methods and as applied to speech recognition.  ... 
doi:10.1109/socpar.2010.5686409 dblp:conf/socpar/FranziniB10 fatcat:zqeoozazrnbozj3a5tgj7lk2qe

Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence [article]

Sean Andrist, Dan Bohus
2020 arXiv   pre-print
We describe Platform for Situated Intelligence, an open-source framework for multimodal, integrative-AI systems.  ...  The framework provides infrastructure, tools, and components that enable and accelerate the development of applications that process multimodal streams of data and in which timing is critical.  ...  We would also like to thank Eric Horvitz for his contributions and support, as well as our early adopters for their feedback and suggestions.  ... 
arXiv:2010.06084v1 fatcat:fx2vrel7lbf5xeivj5gzqh3iqi

A survey of affect recognition methods

Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang
2007 Proceedings of the ninth international conference on Multimodal interfaces - ICMI '07  
Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion  ...  Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test  ...  ACKNOWLEDGMENTS The authors would like to thank Qiang Ji and the anonymous reviewers for encouragement and valuable comments. This paper is a collaborative work.  ... 
doi:10.1145/1322192.1322216 dblp:conf/icmi/ZengPRH07 fatcat:byahqj5l4zhzhg5mzlgqu3edqy

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Zhihong Zeng, M. Pantic, G.I. Roisman, T.S. Huang
2009 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion  ...  Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test  ...  ACKNOWLEDGMENTS The authors would like to thank Qiang Ji and the anonymous reviewers for encouragement and valuable comments. This paper is a collaborative work.  ... 
doi:10.1109/tpami.2008.52 pmid:19029545 fatcat:xntouxtsxze3bn3dstdrv5eibi

Simulating a Smartboard by Real-Time Gesture Detection in Lecture Videos

Feng Wang, Chong-Wah Ngo, Ting-Chuen Pong
2008 IEEE transactions on multimedia  
Gesture plays an important role for recognizing lecture activities in video content analysis.  ...  In contrast to the conventional "complete gesture" recognition, we emphasize detection by the prediction from "incomplete gesture".  ...  Because the gesture and speech are not always temporally aligned, we also consider speech at time before and after the gesture is predicted.  ... 
doi:10.1109/tmm.2008.922871 fatcat:nbg4w2gw7zcsznyvplc75oayxa

DECtalk Software: Text-to-Speech Technology and Implementation

William I. Hallahan
1995 Digital technical journal of Digital Equipment Corporation  
Bernie Rozmovits, our engineering project leader, was the visionary for this entire effort.  ...  He contributed most of our sample applications on Windows NT, and he also wrote the text-to-speech DDE server.  ...  Text-to-Speech Synthesis Techniques Early attempts at text-to-speech synthesis assembled clauses by concatenating recorded words. This technique produces extremely unnatural-sounding speech.  ... 
dblp:journals/dtj/Hallahan95 fatcat:rlilnchvxnetbjjqdh237kajaq

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments [article]

Mario Esparza
2022 arXiv   pre-print
When trained on 41 English phonemes, 0.099 PER is achieved on Speech Commands.  ...  Dynamic speech recognition between Spanish and English is required in these environments.  ...  Glossary AOLME Advancing Out-of-School Learning in Mathematics ASR Automatic Speech Recognition CNN Convolutional Neural Network CTC Connectionist Temporal Classification xii 1.1 AOLME's classroom example  ... 
arXiv:2202.10536v1 fatcat:ewip5ygbpjccpc4dpj7ylgjsbm

Mediated voice communication via mobile IP

Chris Schmandt, Jang Kim, Kwan Lee, Gerardo Vallejo, Mark Ackerman
2002 Proceedings of the 15th annual ACM symposium on User interface software and technology - UIST '02  
Motivated by our concern for mobility, Impromptu does not use the screen for display or input; speech recognition and synthesis are provided as services in the network.  ...  assigns a port for the application to connect to for bi-directional audio streaming.  ... 
doi:10.1145/571985.572005 dblp:conf/uist/SchmandtKLVA02 fatcat:4r4cinxpajbkrho4o7l6g4z7ii

Mediated voice communication via mobile IP

Chris Schmandt, Jang Kim, Kwan Lee, Gerardo Vallejo, Mark Ackerman
2002 Proceedings of the 15th annual ACM symposium on User interface software and technology - UIST '02  
Motivated by our concern for mobility, Impromptu does not use the screen for display or input; speech recognition and synthesis are provided as services in the network.  ...  assigns a port for the application to connect to for bi-directional audio streaming.  ... 
doi:10.1145/572002.572005 fatcat:f55mucyhbnakhbauq33mchtjle

MARVEL - D3.1: Multimodal and privacy-aware audio-visual intelligence – initial version

Alexandros Iosifidis
2022 Zenodo  
as methodologies for improving the training and efficiency of AI models under supervised, unsupervised, and cross-modal contrastive learning settings.  ...  These include methods for Sound Event De- tection, Sound Event Localisation and Detection, Automated Audio Captioning, Visual Anomaly Detection, Visual Crowd Counting, Audio-Visual Crowd Counting, as well  ...  Koizumi for their input on previously reported results, and to acknowledge CSC-IT Center for Science, Finland, for computational resources.  ... 
doi:10.5281/zenodo.6821317 fatcat:eia7rkk5lfbg7khs3qcat5qd3m

Spoken language understanding

Ye-Yi Wang, Li Deng, A. Acero
2005 IEEE Signal Processing Magazine  
B esides dictation, there are many other practical applications for speech recognition, including command and control, spoken dialog systems [1], [2] , speech-to-speech translation [3] , and multimodal  ...  In contrast to automatic speech recognition (ASR), which converts a speaker's spoken utterance into a text string, spoken language understanding (SLU) is aimed at interpreting user's intentions from their  ...  speech recognition.  ... 
doi:10.1109/msp.2005.1511821 fatcat:z3ubnpeocrf7nmpn2ftxtfmism

Why Don't You See What I Mean? Prospects and Limitations of Current Automatic Sign Recognition Research

Gineke ten Holt, Petra Hendriks, Tjeerd Andringa
2006 Sign Language Studies  
Some of these problems are shared with automatic speech recognition, while others seem to be particular to automatic sign recognition.  ...  This article presents an overview of current automatic sign recognition research.  ...  Acknowledgements The authors would like to thank an anonymous reviewer for his/her valuable comments and suggestions. Notes  ... 
doi:10.1353/sls.2006.0024 fatcat:un3cwxe4ajckjhmtz4slw56egu
« Previous Showing results 1 — 15 out of 3,870 results