452 Hits in 3.1 sec

End-to-End Multi-View Lipreading [article]

Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic
2017 arXiv   pre-print
In this work, we present an end-to-end multi-view lipreading system based on Bidirectional Long-Short Memory (BLSTM) networks.  ...  Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading.  ...  A.4 5-view results The combination of all views is shown in Table A6 . It outperforms the frontal view performance but not the best 3-view model 0 • + 45 • + 90 • .  ... 
arXiv:1709.00443v1 fatcat:pi4fs6xda5f2jeaszjlfmryy34

End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC

Sanghun Jeon, Mun Sang Kim
2022 Sensors  
To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°).  ...  Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.  ...  For details, please refer to [27] . Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/s22093597 pmid:35591284 pmcid:PMC9099765 fatcat:gdasdywuijhm5hhnoz7l6vqcnm

A Survey of Research on Lipreading Technology

Mingfeng Hao, Mutallip Mamut, Nurbiya Yadikar, Alimjan Aysa, Kurban Ubul
2020 IEEE Access  
Ouluvs2 database is the most widely used multi-view database. Lee et al.  ...  [19] proposed an end-to-end sentence-level lipreading architecture (LipNet).  ... 
doi:10.1109/access.2020.3036865 fatcat:dqxhtoenf5fjro6yfjmfywa3fa

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

Shinnosuke Isobe, Satoshi Tamura, Satoru Hayamizu, Yuuto Gotoh, Masaki Nose
2021 Future Internet  
In this paper, we propose a novel VSR method that is applicable to faces taken at any angle. Firstly, view classification is carried out to estimate face angles.  ...  Next, lipreading is carried out using the features. We also developed audio-visual speech recognition (AVSR) using the VSR in addition to conventional ASR.  ...  In [11] , Petridis et al. proposed an end-to-end multi-view lipreading system based on bidirectional LSTM networks.  ... 
doi:10.3390/fi13070182 fatcat:zp5zlqnbqnfurduc2ecz6t7n54

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Yaman Kumar, Mayank Aggarwal, Pratham Nawal, Shin'ichi Satoh, Rajiv Ratn Shah, Roger Zimmermann
2018 2018 ACM Multimedia Conference on Multimedia Conference - MM '18  
To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system.  ...  Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue.  ...  This problem can be solved using multi-view lipreading.  ... 
doi:10.1145/3240508.3241911 dblp:conf/mm/KumarANSSZ18 fatcat:7x4tmy2ozjgbrpnzejjy27rqcq

Multi-pose lipreading and audio-visual speech recognition

Virginia Estellers, Jean-Phlippe Thiran
2012 EURASIP Journal on Advances in Signal Processing  
exploit the visual modality in the multi-modal system.  ...  It is then necessary to develop an effective framework for pose invariant lipreading.  ...  This step, which is not usually part of the lipreading but the face recognition problem, is critical for the rest of the system and induced the term front-end effect to refer to the effects of the ROI  ... 
doi:10.1186/preaccept-2074613880613707 fatcat:recciisa2ng5nevabjbjfvpsqa

Multi-pose lipreading and audio-visual speech recognition

Virginia Estellers, Jean-Philippe Thiran
2012 EURASIP Journal on Advances in Signal Processing  
exploit the visual modality in the multi-modal system.  ...  It is then necessary to develop an effective framework for pose invariant lipreading.  ...  This step, which is not usually part of the lipreading but the face recognition problem, is critical for the rest of the system and induced the term front-end effect to refer to the effects of the ROI  ... 
doi:10.1186/1687-6180-2012-51 fatcat:4qhc5wlak5gfdecyh272uo67n4

Towards Practical Lipreading with Distilled and Efficient Models [article]

Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic
2021 arXiv   pre-print
Lipreading has witnessed a lot of progress due to the resurgence of neural networks.  ...  However, there is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.  ...  Pantic, “End-to-end multi- label progression,” arXiv:1805.02641, 2018. view lipreading,” in BMVC, 2017. [23] J. S. Chung and A.  ... 
arXiv:2007.06504v3 fatcat:7xixuerqorbxlhimvmw3nlcupe

On the Importance of Video Action Recognition for Visual Lipreading [article]

Xinshuo Weng
2019 arXiv   pre-print
Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor  ...  We focus on the word-level visual lipreading, which requires to decode the word from the speaker's video.  ...  Lipreading using Convolutional Neural Network. INTERSPEECH, 2014. 1, 3 [24] A. Pass, J. Zhang, and D. Stewart. An Investigation into Features for Multi-View Lipreading.  ... 
arXiv:1903.09616v2 fatcat:27vffftd6rfbfi7gcu5lhipqdy

Combining Residual Networks with LSTMs for Lipreading [article]

Themos Stafylakis, Georgios Tzimiropoulos
2017 arXiv   pre-print
We propose an end-to-end deep learning architecture for word-level visual speech recognition.  ...  We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts.  ...  The views expressed in this paper are those of the authors and do not engage any official position of the funding agencies. References  ... 
arXiv:1703.04105v4 fatcat:3nrrs4ndfjbzlfvliijg5isoya

End-to-End Audiovisual Fusion with LSTMs

Stavros Petridis, Yujiang Wang, Zuwei Li, Maja Pantic
2017 The 14th International Conference on Auditory-Visual Speech Processing  
In this work, we present an end-to-end audiovisual model based on Bidirectional Long-Short Memory (BLSTM) networks.  ...  We also perform audiovisual speech recognition experiments on the OuluVS2 database using different views of the mouth, frontal to profile.  ...  End-to-end Audiovisual Fusion The proposed deep learning system for multi-view lipreading is shown in Fig. 1 .  ... 
doi:10.21437/avsp.2017-8 dblp:conf/avsp/PetridisWLP17 fatcat:kmmowoi725dnfm67yrom5rkshe

Lipreading Using Profile Versus Frontal Views

Patrick Lucey, Gerasimos Potamianos
2006 2006 IEEE Workshop on Multimedia Signal Processing  
In this paper, we particularly describe our visual front end approach, and report experiments on a multi-subject, small-vocabulary, bimodal, multisensory database that contains synchronously captured audio  ...  In contrast, this paper investigates extracting visual speech information from the speaker's profile view, and, to our knowledge, constitutes the first real attempt to attack this problem.  ...  Notice that these fusion mechanisms will also be used in our experiments to combine the profileand frontal-view visual-only ASR (lipreading) systems into a "multi-view" lipreading system, as discussed  ... 
doi:10.1109/mmsp.2006.285261 dblp:conf/mmsp/LuceyP06 fatcat:ucbtzt6jnrdnpjwyr2klxl2lku

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Tao Zhang, Lun He, Xudong Li, Guoqing Feng
2021 Applied Sciences  
To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and  ...  Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress.  ...  Acknowledgments: The authors would like to acknowledge the high-performance graphics card provided by the DSP Laboratory of School of Electrical and Information Engineering, Tianjin University and the  ... 
doi:10.3390/app11156975 fatcat:yzkbfayiczedfl3tedtufx2tfq

A Pipeline to Data Preprocessing for Lipreading and Audio-Visual Speech Recognition

Hea Choon Ngo
2020 International Journal of Advanced Trends in Computer Science and Engineering  
Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone.  ...  Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find  ...  LipNet is the first end-to-end lipreading model at the sentence level. LipNet is consistently trained to predict sentences.  ... 
doi:10.30534/ijatcse/2020/58942020 fatcat:ywutp7wngzc6xjkdmwklncbjfe

"Notic My Speech" – Blending Speech Patterns With Multimedia [article]

Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann
2020 arXiv   pre-print
To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.  ...  Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.  ...  Figure 1 illustrates the architecture overview of our proposed end-to-end multi-view visual speech recognition system, which consists of three major components: namely the video encoder, the view-temporal  ... 
arXiv:2006.08599v1 fatcat:npntochngve25iuppcroucaf4i
« Previous Showing results 1 — 15 out of 452 results