A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
End-to-End Multi-View Lipreading
[article]
2017
arXiv
pre-print
In this work, we present an end-to-end multi-view lipreading system based on Bidirectional Long-Short Memory (BLSTM) networks. ...
Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading. ...
A.4 5-view results The combination of all views is shown in Table A6 . It outperforms the frontal view performance but not the best 3-view model 0 • + 45 • + 90 • . ...
arXiv:1709.00443v1
fatcat:pi4fs6xda5f2jeaszjlfmryy34
End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC
2022
Sensors
To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). ...
Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications. ...
For details, please refer to [27] .
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/s22093597
pmid:35591284
pmcid:PMC9099765
fatcat:gdasdywuijhm5hhnoz7l6vqcnm
A Survey of Research on Lipreading Technology
2020
IEEE Access
Ouluvs2 database is the most widely used multi-view database. Lee et al. ...
[19] proposed an end-to-end sentence-level lipreading architecture (LipNet). ...
doi:10.1109/access.2020.3036865
fatcat:dqxhtoenf5fjro6yfjmfywa3fa
Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition
2021
Future Internet
In this paper, we propose a novel VSR method that is applicable to faces taken at any angle. Firstly, view classification is carried out to estimate face angles. ...
Next, lipreading is carried out using the features. We also developed audio-visual speech recognition (AVSR) using the VSR in addition to conventional ASR. ...
In [11] , Petridis et al. proposed an end-to-end multi-view lipreading system based on bidirectional LSTM networks. ...
doi:10.3390/fi13070182
fatcat:zp5zlqnbqnfurduc2ecz6t7n54
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
2018
2018 ACM Multimedia Conference on Multimedia Conference - MM '18
To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. ...
Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. ...
This problem can be solved using multi-view lipreading. ...
doi:10.1145/3240508.3241911
dblp:conf/mm/KumarANSSZ18
fatcat:7x4tmy2ozjgbrpnzejjy27rqcq
Multi-pose lipreading and audio-visual speech recognition
2012
EURASIP Journal on Advances in Signal Processing
exploit the visual modality in the multi-modal system. ...
It is then necessary to develop an effective framework for pose invariant lipreading. ...
This step, which is not usually part of the lipreading but the face recognition problem, is critical for the rest of the system and induced the term front-end effect to refer to the effects of the ROI ...
doi:10.1186/preaccept-2074613880613707
fatcat:recciisa2ng5nevabjbjfvpsqa
Multi-pose lipreading and audio-visual speech recognition
2012
EURASIP Journal on Advances in Signal Processing
exploit the visual modality in the multi-modal system. ...
It is then necessary to develop an effective framework for pose invariant lipreading. ...
This step, which is not usually part of the lipreading but the face recognition problem, is critical for the rest of the system and induced the term front-end effect to refer to the effects of the ROI ...
doi:10.1186/1687-6180-2012-51
fatcat:4qhc5wlak5gfdecyh272uo67n4
Towards Practical Lipreading with Distilled and Efficient Models
[article]
2021
arXiv
pre-print
Lipreading has witnessed a lot of progress due to the resurgence of neural networks. ...
However, there is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. ...
Pantic, “End-to-end multi- label progression,” arXiv:1805.02641, 2018.
view lipreading,” in BMVC, 2017. [23] J. S. Chung and A. ...
arXiv:2007.06504v3
fatcat:7xixuerqorbxlhimvmw3nlcupe
On the Importance of Video Action Recognition for Visual Lipreading
[article]
2019
arXiv
pre-print
Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor ...
We focus on the word-level visual lipreading, which requires to decode the word from the speaker's video. ...
Lipreading using Convolutional Neural
Network. INTERSPEECH, 2014. 1, 3
[24] A. Pass, J. Zhang, and D. Stewart. An Investigation into Features for Multi-View Lipreading. ...
arXiv:1903.09616v2
fatcat:27vffftd6rfbfi7gcu5lhipqdy
Combining Residual Networks with LSTMs for Lipreading
[article]
2017
arXiv
pre-print
We propose an end-to-end deep learning architecture for word-level visual speech recognition. ...
We train and evaluate it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size target-words consisting of 1.28sec video excerpts from BBC TV broadcasts. ...
The views expressed in this paper are those of the authors and do not engage any official position of the funding agencies.
References ...
arXiv:1703.04105v4
fatcat:3nrrs4ndfjbzlfvliijg5isoya
End-to-End Audiovisual Fusion with LSTMs
2017
The 14th International Conference on Auditory-Visual Speech Processing
In this work, we present an end-to-end audiovisual model based on Bidirectional Long-Short Memory (BLSTM) networks. ...
We also perform audiovisual speech recognition experiments on the OuluVS2 database using different views of the mouth, frontal to profile. ...
End-to-end Audiovisual Fusion The proposed deep learning system for multi-view lipreading is shown in Fig. 1 . ...
doi:10.21437/avsp.2017-8
dblp:conf/avsp/PetridisWLP17
fatcat:kmmowoi725dnfm67yrom5rkshe
Lipreading Using Profile Versus Frontal Views
2006
2006 IEEE Workshop on Multimedia Signal Processing
In this paper, we particularly describe our visual front end approach, and report experiments on a multi-subject, small-vocabulary, bimodal, multisensory database that contains synchronously captured audio ...
In contrast, this paper investigates extracting visual speech information from the speaker's profile view, and, to our knowledge, constitutes the first real attempt to attack this problem. ...
Notice that these fusion mechanisms will also be used in our experiments to combine the profileand frontal-view visual-only ASR (lipreading) systems into a "multi-view" lipreading system, as discussed ...
doi:10.1109/mmsp.2006.285261
dblp:conf/mmsp/LuceyP06
fatcat:ucbtzt6jnrdnpjwyr2klxl2lku
Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks
2021
Applied Sciences
To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and ...
Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. ...
Acknowledgments: The authors would like to acknowledge the high-performance graphics card provided by the DSP Laboratory of School of Electrical and Information Engineering, Tianjin University and the ...
doi:10.3390/app11156975
fatcat:yzkbfayiczedfl3tedtufx2tfq
A Pipeline to Data Preprocessing for Lipreading and Audio-Visual Speech Recognition
2020
International Journal of Advanced Trends in Computer Science and Engineering
Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. ...
Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find ...
LipNet is the first end-to-end lipreading model at the sentence level. LipNet is consistently trained to predict sentences. ...
doi:10.30534/ijatcse/2020/58942020
fatcat:ywutp7wngzc6xjkdmwklncbjfe
"Notic My Speech" – Blending Speech Patterns With Multimedia
[article]
2020
arXiv
pre-print
To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. ...
Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. ...
Figure 1 illustrates the architecture overview of our proposed end-to-end multi-view visual speech recognition system, which consists of three major components: namely the video encoder, the view-temporal ...
arXiv:2006.08599v1
fatcat:npntochngve25iuppcroucaf4i
« Previous
Showing results 1 — 15 out of 452 results