Filters








248 Hits in 2.7 sec

Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading [article]

Chunlin Tian, Weijun Ji
2017 arXiv   pre-print
I would also thank my laboratory for providing computational resources.  ...  Acknowledgments I would like to acknowledge my friends Yi Sun, Wenqiang Yang and Peng Xie for helpful support. I would like to thank the developers of Torch [6] and Tensorflow [1] .  ...  of DNN can be exploited for varieties of tasks, especially the CNN for image feature extraction.  ... 
arXiv:1701.04224v2 fatcat:qezsua73qrewtnweanyojrzv7y

Learning from Videos with Deep Convolutional LSTM Networks [article]

Logan Courtney, Ramavarapu Sreenivas
2019 arXiv   pre-print
We describe our experiments involving convolution LSTMs for lipreading that demonstrate the model is capable of selectively choosing which spatiotemporal scales are most relevant for a particular dataset  ...  spatiotemporal features existent within the problem.  ...  Spatiotemporal Features Sensitivity Analysis The internal cell states and the hidden states are reset to 0 before processing a new sequence.  ... 
arXiv:1904.04817v1 fatcat:ck5bx6dexvhw5ir7xr4ys5auo4

What accounts for individual differences in susceptibility to the McGurk effect?

Violet A Brown, Maryam Hedayati, Annie Zanger, Sasha Mayn, Lucia Ray, Naseem Dillman-Hasso, Julia F Strand
2018 PLoS ONE  
These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill.  ...  ., a more fine-grained analysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency.  ...  Acknowledgments Authors' note: We are grateful to Eun Jong Kong and Jan Edwards for providing stimuli for the Visual Analogue Scale task, to Hunter Brown for feedback on an early draft of the paper, and  ... 
doi:10.1371/journal.pone.0207160 pmid:30418995 pmcid:PMC6231656 fatcat:tfcnvc76xvc6biwize7mfunrmq

About Face: Seeing the Talker Improves Spoken Word Recognition but Increases Listening Effort

Violet A. Brown, Julia F. Strand
2019 Journal of Cognition  
Acknowledgements The authors would like to thank Jonathan Peelle for helpful feedback on an earlier draft of this paper and the undergraduate research assistants at Carleton College for helpful conversations  ...  One possible explanation for this finding is that the effort required to process AV speech slows response times to the secondary task, and this slowing allows participants to have more time to accurately  ...  For all mixed effects models reported in this paper, we attempted to utilize the maximal random effects structure justified by the design (Barr, Levy, Scheepers, & Tily, 2013) .  ... 
doi:10.5334/joc.89 pmid:31807726 pmcid:PMC6873894 fatcat:agvmhhy3n5fo7opc42mlkv6y2m

Page 2712 of Psychological Abstracts Vol. 71, Issue 10 [page]

1984 Psychological Abstracts  
An analysis of probable environmental input and of the featuresutility in separating already-counted from to-be-counted objects is proposed to account for the relative probabilities that Ss knew that  ...  It is concluded that movement of stimulus features need not account for the extensive recency advantage in remembering lipread seaal difficulty Gutnautthing Leb-cihe aad an bts eed Ot tee DEVELOPMENTAL  ... 

Comparison of Spatiotemporal Networks for Learning Video Related Tasks [article]

Logan Courtney, Ramavarapu Sreenivas
2020 arXiv   pre-print
Many methods for learning from video sequences involve temporally processing 2D CNN features from the individual frames or directly utilizing 3D convolutions within high-performing 2D CNN architectures  ...  spatiotemporal features.  ...  This issue becomes more apparent after viewing recent techniques for lipreading.  ... 
arXiv:2009.07338v1 fatcat:bgc4ixqc6faybfascyj236dae4

Page 1440 of Psychological Abstracts Vol. 42, Issue 9 [page]

1968 Psychological Abstracts  
—The performance of 110 aphasic Ss on 25 subtests of a battery of tests for spoken language were subjected to a dimensional or factorial analysis. 2 factors were found which accounted for 95.9°, of the  ...  Slow Learning Child: The Australian Journal on the Education of Backward Children, 1967, 14(2), 117- 122.  ... 

Tactual display of consonant voicing as a supplement to lipreading

Hanfeng Yuan, Charlotte M. Reed, Nathaniel I. Durlach
2005 Journal of the Acoustical Society of America  
This research is concerned with the development of tactual displays to supplement the information available through lipreading.  ...  A special thank to Andy Brughera for the funny times working together, and to Lorraine Delhorne for her help in speech segmentation.  ...  Chapter 3 describes the method of lipreading, and motivation for pursuing study of the feature of voicing as a supplement to lipreading.  ... 
doi:10.1121/1.1945787 pmid:16158656 fatcat:vd4vbwiyzrb37imlpfa4ptwc6m

Point-Light Facial Displays Enhance Comprehension of Speech in Noise

Lawrence D. Rosenblum, Jennifer A. Johnson, Helena M. Saldan~a
1996 Journal of Speech, Language and Hearing Research  
These results have implications for uncovering salient visual speech information as well as the development of telecommunication systems for listeners who are hearing-impaired.  ...  There is little known about which characteristics of the face are useful for enhancing the degraded signal.  ...  For example, additional points on relatively slow-moving articulators might help provide better references against which movement of the more animated points are seen.  ... 
doi:10.1044/jshr.3906.1159 fatcat:64ux7zmnbjad7acee23doqmdu4

A comparison of bound and unbound audio–visual information processing in the human cerebral cortex

Ingrid R Olson, J.Christopher Gatenby, John C Gore
2002 Cognitive Brain Research  
A region-of-interest analysis of the STS and parietal areas found no difference between audio-visual conditions.  ...  However, this analysis found that synchronized audio-visual stimuli led to a higher signal change in the claustrum region.  ...  Frost, and Ainer Mencel for help in stimulus preparation.  ... 
doi:10.1016/s0926-6410(02)00067-8 pmid:12063136 fatcat:gszmefbg6vhnbm3tfzfroiwqoa

A Chinese Lip-Reading System Based on Convolutional Block Attention Module

Yuanyao Lu, Qi Xiao, Haiyang Jiang, Jude Hemanth
2021 Mathematical Problems in Engineering  
We also add the time attention mechanism to the GRU neural network, which helps to extract the features among consecutive lip motion images.  ...  Considering the effects of the moments before and after on the current moment in the lip-reading process, we assign more weights to the key frames, which makes the features more representative.  ...  Experiment Evaluation and Analysis (1) Experiment Evaluation.  ... 
doi:10.1155/2021/6250879 fatcat:qbbvesio6vgdxnu4chuikhalsa

Local spatiotemporal descriptors for visual recognition of spoken phrases

Guoying Zhao, Matti Pietikäinen, Abdenour Hadid
2007 Proceedings of the international workshop on Human-centered multimedia - HCM '07  
Spatiotemporal local binary patterns extracted from these regions are used for describing phrase sequences.  ...  Positions of the eyes determined by a robust face and eye detector are used for localizing the mouth regions in face images.  ...  method which uses a non-linear scale-space analysis to form features directly from the pixel intensity.  ... 
doi:10.1145/1290128.1290138 dblp:conf/mm/ZhaoPH07 fatcat:uooot3dn35a7faffo2e34betii

Cued Speech and the Reception of Spoken Language

Gaye H. Nicholls, Daniel Ling Mcgill
1982 Journal of Speech, Language and Hearing Research  
from adding an acoustic signal to lipreading result from the subjects· utilization of tüne-intensity eues alone. '!'  ...  An analysis of .. varianoe comparinq the results for syllables, and high and low predictability sentences was carried out.  ...  Speech envelope eues as an acoustic aid to lipreading for profoqndly deaf children. J. Acoust. Soc. Amer., 51, Hannah, E.P. Speechreading: some 1ingui~tic factors.  ... 
doi:10.1044/jshr.2502.262 fatcat:gfvrpitbn5falbb3ieyudst3ny

Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Xuejie Zhang, Yan Xu, Andrew K. Abel, Leslie S. Smith, Roger Watt, Amir Hussain, Chengxiang Gao
2020 Entropy  
One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be  ...  features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning.  ...  Acknowledgments: The authors would like to thank Erick Purwanto for his contributions, and Cynthia Marquez for her vital assistance.  ... 
doi:10.3390/e22121367 pmid:33279914 fatcat:sqmegoznnjdqdgndddfh4vqelu

Fast Feature Extraction Approach for Multi-Dimension Feature Space Problems

A. Sagheer, N. Tsuruta, R. Taniguchi, D. Arita, S. Maeda
2006 18th International Conference on Pattern Recognition (ICPR'06)  
Recently, we proposed a fast feature extraction approach denoted FSOM utilizes Self Organizing Map (SOM). FSOM [1] overcomes the slowness of traditional SOM search algorithm.  ...  Again, we show here how is FSOM reduces the feature extraction time of traditional SOM drastically while preserving same SOM's qualities.  ...  Figure 1 shows samples for the utilized images.  ... 
doi:10.1109/icpr.2006.545 dblp:conf/icpr/SagheerTTAM06 fatcat:7c34tqmdzncxphab3eazriloxy
« Previous Showing results 1 — 15 out of 248 results