55 Hits in 8.6 sec

Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

Bao Quoc Nguyen, Thang Tat Vu, Mai Chi Luong
2016 Journal of Computer Science and Cybernetics  
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result  ...  The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively.  ...  In this study, deep neural networks are also applied to improve bottleneck features for Vietnamese speech recognition which were reported previously [7] .  ... 
doi:10.15625/1813-9663/31/4/5944 fatcat:ervfbpkgavbctbhvm3cldkgu4y

Multilingual shifting deep bottleneck features for low-resource ASR

Quoc Bao Nguyen, Jonas Gehring, Markus Muller, Sebastian Stuker, Alex Waibel
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this work, we propose a deep bottleneck feature architecture that is able to leverage data from multiple languages. We also show that tonal features are helpful for non-tonal languages.  ...  Vietnamese.  ...  Modern keyword search systems often make use of the result of a large vocabulary continuous speech recognition (LVCSR) system for performing the task.  ... 
doi:10.1109/icassp.2014.6854676 dblp:conf/icassp/NguyenGMSW14 fatcat:ufeirivtrffprayqkk6x7cvxhe

The Effect of Tone Modeling in Vietnamese LVCSR System

Quoc Bao Nguyen, Tat Thang Vu, Chi Mai Luong
2016 Procedia Computer Science  
In this work, the tone modeling approaches are used manifest the tonal structure of Vietnamese and tonal feature is also used to build acoustic models.  ...  The results on LVCSR using deep bottleneck features (DBNFs) and different types of pronouncing dictionary, are also presented.  ...  Therefore, the combination of MFCC and pitch features 1 2 3 was used to improve the accuracy of a Vietnamese large vocabulary continuous speech recognition (LVCSR) system.  ... 
doi:10.1016/j.procs.2016.04.046 fatcat:rth5rx74eravhmcxm5gwjys2lq

Investigation of multilingual deep neural networks for spoken term detection

K. M. Knill, M. J. F. Gales, S. P. Rath, P. C. Woodland, C. Zhang, S.-X. Zhang
2013 2013 IEEE Workshop on Automatic Speech Recognition and Understanding  
A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speechto-text (STT) systems.  ...  The development of high-performance speech processing systems for low-resource languages is a challenging area.  ...  ACKNOWLEDGEMENTS The authors are grateful to IBM Research's Lorelei Babel team for the KWS system.  ... 
doi:10.1109/asru.2013.6707719 dblp:conf/asru/KnillGRWZZ13 fatcat:a7xop43vjnhbhbhfbck4tcoyj4

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu, Michael T. Johnson
2018 EURASIP Journal on Audio, Speech, and Music Processing  
The proposed models achieve 3 to 10% relative improvements over their corresponding DNN or LSTM baselines across seven language collections.  ...  In addition, the new models accelerate learning speed by a factor of more than 1.6 compared to conventional BLSTM models. By using these approaches, we achieve good results in the IARPA Babel Program.  ...  Availability of data and materials The datasets used or analysed during this paper are available from Babel program. Competing interests The authors declare that they have no competing interests.  ... 
doi:10.1186/s13636-018-0128-6 fatcat:7h46dysw5varjks2daaatfqj2u

A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic Knowledge

Valentin Smirnov, Dmitry Ignatov, Michael Gusev, Mais Farkhadov, Natalia Rumyantseva, Mukhabbat Farkhadova
2016 Journal of Electrical and Computer Engineering  
The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech recognition.  ...  The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and algorithms developed by Speech Drive LLC.  ...  Acknowledgments The authors would like to thank SpRecord LLC authorities for providing real-world telephone-quality data used in training and testing of the keyword spotting system described in this paper  ... 
doi:10.1155/2016/4062786 fatcat:7jhohy6kerbuln7drrwcqfizcq

Deep maxout networks for low-resource speech recognition

Yajie Miao, Florian Metze, Shourabh Rawat
2013 2013 IEEE Workshop on Automatic Speech Recognition and Understanding  
This paper investigates the application of deep maxout networks (DMNs) to large vocabulary continuous speech recognition (LVCSR) tasks.  ...  We extend DMNs to hybrid and bottleneck feature systems, and explore optimal network structures (number of maxout layers, pooling strategy, etc) for both setups.  ...  In Section 4.5, we experimentally show that these sparse outputs pose a useful representation for the raw acoustic features and can improve the performance of hybrid systems.  ... 
doi:10.1109/asru.2013.6707763 dblp:conf/asru/MiaoMR13 fatcat:ssjcwoutfrd7xjrwq2zztvbnnu

Computational intelligence in processing of speech acoustics: a survey

Amitoj Singh, Navkiran Kaur, Vinay Kukreja, Virender Kadyan, Munish Kumar
2022 Complex & Intelligent Systems  
This paper presents a comprehensive survey on the speech recognition techniques for non-Indian and Indian languages, and compiled some of the computational models used for processing speech acoustics.  ...  However, a limited number of automatic speech recognition systems are available for commercial use.  ...  , large vocabulary continuous speech recognition, and ASR systems.  ... 
doi:10.1007/s40747-022-00665-1 fatcat:6pu2xccbq5as7bn2y2tav2fdwa

Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview

Chongchong Yu, Meng Kang, Yunbing Chen, Jiajia Wu, Xia Zhao
2020 IEEE Access  
Based on the complex network structures and huge model parameters, deep learning has become a powerful science in the process of speech recognition, which has a broad and far-reaching significance for  ...  Therefore, speech recognition for lowresource scenario has become a hot topic in the field of speech.  ...  For example, deep acoustic model with five layers of LSTM proposed by Google has achieved impressive improvements for large vocabulary speech recognition task [14] .  ... 
doi:10.1109/access.2020.3020421 fatcat:uiws6fazpnghzj5lmkkmy7ol3y

Automatic speech recognition for under-resourced languages: A survey

Laurent Besacier, Etienne Barnard, Alexey Karpov, Tanja Schultz
2014 Speech Communication  
We propose, in this paper, a survey that focuses on automatic speech recognition (ASR) for these languages.  ...  It should be clear, however, that many of the issues and approaches presented here, apply to speech technology in general (text-to-speech synthesis for instance).  ...  Artificial Neural Networks (ANN) including single hidden layer NN and multiple hidden layers NN (Deep Neural Networks DNN or Deep Belief Networks DBN) are also used for ASR subtasks such as acoustic modeling  ... 
doi:10.1016/j.specom.2013.07.008 fatcat:d63jjvobkfd7nk56jwgc6gq3ja

Chasing the metric: Smoothing learning algorithms for keyword detection

Oriol Vinyals, Steven Wegmann
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In particular, we were able to automatically set the detection threshold while improving ATWV by more than 1% using a computationally cheap method based on a smoothed ATWV on both single systems and for  ...  ., for languages for which we do not have enough data), and found it useful to optimize the Actual Term Weighted Value (ATWV) directly.  ...  We describe them in the following two sections: The recognition system The Kaldi speech recognition toolkit [3] , along with the TNet 1 toolkit, were used for recognition and lattice generation.  ... 
doi:10.1109/icassp.2014.6854211 dblp:conf/icassp/VinyalsW14 fatcat:2typ56zfnzghjknbwbj5dp6cma

Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages

Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz
2020 Interspeech 2020  
In this paper, we present the cross-lingual and multilingual experiments we have conducted using existing resources of other languages for the development of speech recognition system for less-resourced  ...  We have developed multilingual (ML) Automatic Speech Recognition (ASR) systems and decoded speech of the four Ethiopian languages.  ...  Acknowledgements We would like to express our gratitude to the Alexander von Humboldt Foundation for funding the research stay at the Cognitive Systems Lab (CSL) of the University of Bremen.  ... 
doi:10.21437/interspeech.2020-2827 dblp:conf/interspeech/TachbelieAS20 fatcat:tvhxzqdc4vgf7mykwy7obj6rqe

Multimodal Emotion Recognition for AVEC 2016 Challenge

Filip Povolny, Pavel Matejka, Michal Hradis, Anna Popková, Lubomir Otrusina, Pavel Smrz, Ian Wood, Cecile Robin, Lori Lamel
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
The original audio features were complement with bottleneck features and also text-based emotion recognition which is based on transcribing audio by an automatic speech recognition system and applying  ...  This paper describes a systems for emotion recognition and its application on the dataset from the AV+EC 2016 Emotion Recognition Challenge.  ...  The training speech was forcealigned using our BABEL ASR system [15] . Bottleneck features derived from this system are denoted as BN-Multi.  ... 
doi:10.1145/2988257.2988268 dblp:conf/mm/PovolnyMHPOSWRL16 fatcat:vx36lkmagrf45dtb2jddntwqfi

Transfer learning of language-independent end-to-end ASR with language model fusion [article]

Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe
2019 arXiv   pre-print
We also investigate various seed models for transfer learning.  ...  We first build a language-independent ASR system in a unified sequence-to-sequence (S2S) architecture with a shared vocabulary among all languages.  ...  INTRODUCTION Fast system development for low-resourced new languages is one of the challenges in automatic speech recognition (ASR).  ... 
arXiv:1811.02134v2 fatcat:kif25nfijzbqfexzrlr7up4lpi

The Multi-Domain International Search on Speech 2020 ALBAYZIN Evaluation: Overview, Systems, Results, Discussion and Post-Evaluation Analyses

Javier Tejedor, Doroteo T. Toledano, Jose M. Ramirez, Ana R. Montalvo, Juan Ignacio Alvarez-Trejos
2021 Applied Sciences  
The most novel features of the submitted systems are a data augmentation technique for the STD task and an end-to-end system for the QbE STD task.  ...  The large amount of information stored in audio and video repositories makes search on speech (SoS) a challenging area that is continuously receiving much interest.  ...  Acknowledgments: Authors also thank to Alicia Lozano Díez for the support on English edition. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app11188519 fatcat:rut6jfdlrbb2bcycey42dsll5e
« Previous Showing results 1 — 15 out of 55 results