Filters








56 Hits in 4.9 sec

Arabic speech recognition using end‐to‐end deep learning

Hamzah A. Alsayadi, Abdelaziz A. Abdelhamid, Islam Hegazy, Zaki T. Fayed
2021 IET Signal Processing  
To fill this gap, this work presents a new CTC-based ASR, CNN-LSTM, and an attention-based end-to-end approach for improving diacritisedArabic ASR.  ...  In this work, the application of state-of-the-art end-to-end deep learning approaches is investigated to build a robust diacritised Arabic ASR.  ...  For this purpose, this work proposes a new method, which combines CTC-based ASR with an attentionbased the end-to-end ASR in the system.  ... 
doi:10.1049/sil2.12057 fatcat:jqzkk4f6xzch7gorjhv35dodwu

End-to-End Speech Recognition: A review for the French Language [article]

Florian Boyer, Jean-Luc Rouas
2019 arXiv   pre-print
Recently, end-to-end ASR based either on sequence-to-sequence networks or on the CTC objective function gained a lot of interest from the community, achieving competitive results over traditional systems  ...  In this paper we propose a review of the existing end-to-end ASR approaches for the French language.  ...  END-TO-END SYSTEMS FOR SPEECH RECOGNITION Connectionist Temporal Classification The CTC [6] can be seen as a direct translation of conventional HMM-DNN ASR systems into lexicon-free systems.  ... 
arXiv:1910.08502v2 fatcat:yndio25lzvcw3ndeosrtbwnha4

Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech [article]

Ying Qin, Wei Liu, Zhiyuan Peng, Si-Ioi Ng, Jingyu Li, Haibo Hu, Tan Lee
2021 arXiv   pre-print
The resulting model is light-weight and can be fine-tuned in an end-to-end manner for AD recognition.  ...  Most recent works concentrate on the use of advanced BERT-like classifiers for AD detection. Input to these classifiers are speech transcripts produced by automatic speech recognition (ASR) models.  ...  This makes it convenient to be jointly fine-tuned in an end-to-end fashion.  ... 
arXiv:2110.01493v1 fatcat:epkxhdupifdehfwzvky6bli2k4

End-to-end Audio-visual Speech Recognition with Conformers [article]

Pingchuan Ma, Stavros Petridis, Maja Pantic
2021 arXiv   pre-print
In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.  ...  We present results on the largest publicly available datasets for sentence-level speech recognition, Lip Reading Sentences 2 (LRS2) and Lip Reading Sentences 3 (LRS3), respectively.  ...  We would like to thank Dr. Jie Shen for his help with face tracking. The work of Pingchuan Ma has been partially supported by Honda and "AWS Cloud Credits for Research".  ... 
arXiv:2102.06657v1 fatcat:pr5od7w73vfr5gs7wyjl4tvhvq

End-to-end acoustic modelling for phone recognition of young readers [article]

Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini
2021 arXiv   pre-print
We find that transfer learning techniques are highly efficient on end-to-end architectures for adult-to-child adaptation with a small amount of child speech data.  ...  DNN-HMM model by 6.6% relative, as well as other end-to-end architectures by more than 8.5% relative.  ...  End-to-end architectures have proved their ability to outperform hybrid DNN-HMM approaches for ASR.  ... 
arXiv:2103.02899v1 fatcat:4psgwaexjradzf3q6y6xz64fd4

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners [article]

Zehai Tu, Ning Ma, Jon Barker
2022 arXiv   pre-print
This work leverages the hidden representations from DNN-based ASR as features for speech intelligibility prediction in hearing-impaired listeners.  ...  An accurate objective speech intelligibility prediction algorithms is of great interest for many applications such as speech enhancement for hearing aids.  ...  In addition, performances of hidden representations at different levels within the ASR are investigated.  ... 
arXiv:2204.04287v2 fatcat:tzny4sjz7zeipfsr4fwydxebqe

Visual Speech Recognition for Multiple Languages in the Wild [article]

Pingchuan Ma, Stavros Petridis, Maja Pantic
2022 arXiv   pre-print
We show that such model works for different languages and outperforms all previous methods trained on publicly available datasets by a large margin.  ...  It even outperforms models that were trained on non-publicly available datasets containing up to to 21 times more data.  ...  The model is trained end-to-end using a combination of the Connectionist Temporal Classification (CTC) loss with an attention mechanism.  ... 
arXiv:2202.13084v1 fatcat:2fp3fcy2lrdl7pcqwq5jl2urvm

Multi-Spectral Widefield Microscopy of the Beating Heart Through Post-Acquisition Synchronization and Unmixing

Christian Jaques, Linda Bapst-Wicht, Daniel F. Schorderet, Michael Liebling
2019 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)  
The goal of this thesis is to improve current state-of-the-art acoustic modeling techniques in general for ASR, with a particular focus on multilingual ASR and cross-lingual adaptation.  ...  In order to minimize the negative effects of data impurity arising from language mismatch, we investigated language adaptive training approaches which help further improve the multilingual ASR performance  ...  Connectionist Temporal Classification (CTC) [Graves et al., 2006] was the first attempt towards end-to-end ASR.  ... 
doi:10.1109/isbi.2019.8759472 dblp:conf/isbi/JaquesBSL19 fatcat:flypznnglbfrzm3ayf6tsfof34

Confidence Measure for Speech-to-Concept End-to-End Spoken Language Understanding

Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin
2020 Interspeech 2020  
We investigate the use of the hidden representations of our CTC-based SLU system to train an external simple classifier.  ...  Recent studies have led to the introduction of Speechto-Concept End-to-End (E2E) neural architectures for Spoken Language Understanding (SLU) that reach state of the art performance.  ...  On our knowledge, this paper is the first study that investigates confidence measures in the framework of speech-toconcept end-to-end neural architecture for SLU.  ... 
doi:10.21437/interspeech.2020-2298 dblp:conf/interspeech/CaubriereELM20 fatcat:ttxgoysff5h2lp3bgmev5qmh4e

Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction [article]

Zehai Tu, Ning Ma, Jon Barker
2022 arXiv   pre-print
Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility.  ...  Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access.  ...  A mechanism combining the Connectionist Temporal Classification (CTC) and attention-based sequence to sequence (seq2seq) is used for the optimisation [29] .  ... 
arXiv:2204.04288v2 fatcat:ypxku5m4vff6fms3ebijbssj34

Deep Lip Reading: a comparison of models and an online application [article]

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
2018 arXiv   pre-print
The recurrent and fully convolutional models are trained with a Connectionist Temporal Classification loss and use an explicit language model for decoding, the transformer is a sequence-to-sequence model  ...  The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition.  ...  Funding for this research is provided by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems, the Oxford-Google DeepMind Graduate Scholarship, and by the EPSRC Programme Grant Seebibyte EP/  ... 
arXiv:1806.06053v1 fatcat:3zqae7cbvngehas32yel3kxxom

Lahjoita puhetta – a large-scale corpus of spoken Finnish with some benchmarks [article]

Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, Mikko Kurimo
2022 arXiv   pre-print
One further use case is to verify the metadata and transcripts given in this corpus itself, and to suggest artificial metadata and transcripts for the part of the corpus where it is missing.  ...  We provide benchmarks for the use cases, as well down loadable, trained baseline systems with open-source code for reproducibility.  ...  Statements and Declarations The authors have no relevant financial or non-financial interests to disclose.  ... 
arXiv:2203.12906v1 fatcat:rez44sm52neerjide5eypeslla

LiRA: Learning Visual Speech Representations from Audio through Self-supervision [article]

Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic
2021 arXiv   pre-print
We find that this pre-trained model can be leveraged towards word-level and sentence-level lip-reading through feature extraction and fine-tuning experiments.  ...  However, comparatively little attention has been given to leveraging one modality as a training objective to learn from the other.  ...  We use the same conformer encoder architecture as in the pre-training phase, followed by the transformer decoder for sequence-to-sequence training [39] .  ... 
arXiv:2106.09171v1 fatcat:ffq2cwzh4jhpfpjzs6hvybx7ma

Multimodal Machine Translation through Visuals and Speech

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
2019 Zenodo  
This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance  ...  The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality  ...  We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation.  ... 
doi:10.5281/zenodo.3690791 fatcat:otdy5i33fzfsnnbb3xgb6zph6q

Multimodal machine translation through visuals and speech

Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
2020 Machine Translation  
This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance  ...  The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality  ...  We would also like to thank Maarit Koponen for her valuable feedback and her help in establishing our discussions of machine translation evaluation.  ... 
doi:10.1007/s10590-020-09250-0 fatcat:jod3ghcsnnbipotcqp6sme4lna
« Previous Showing results 1 — 15 out of 56 results