Filters








470 Hits in 3.4 sec

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale [article]

Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu
2018 arXiv   pre-print
It was released with a baseline system containing solid training and testing pipelines for Mandarin ASR.  ...  For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR.  ...  The authors would like to thank all other members of AISHELL foundation who contributed to this project and Emotech Labs who provided computational resources for producing most recent system performance  ... 
arXiv:1808.10583v2 fatcat:dgtbh5ezlzatjmuhlo3qbvjupu

Real-Time Robot Audition System That Recognizes Simultaneous Speech in The Real World

Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean-marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi Okuno
2006 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems  
We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition.  ...  We demonstrated that a MFTbased prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously.  ...  It improves recognition performance by using missing feature masks (MFM) which cover unreliable acoustic features used in ASR [3] . MFT is a popular approach for noise-robust ASR.  ... 
doi:10.1109/iros.2006.282037 dblp:conf/iros/YamamotoNNTVKOO06 fatcat:imh6cuppvzh35eggpzp3lenexi

Personalized Keyphrase Detection using Speaker and Environment Information [article]

Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng Huang, Arun Narayanan, Ian McGraw
2021 arXiv   pre-print
The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model.  ...  In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary.  ...  The text-independent speaker verification system successfully verifies the target enrolled user. 2.  ... 
arXiv:2104.13970v2 fatcat:zzrlu4td4bdk7jjeqtu7623puy

Multi-microphone speech recognition in everyday environments

Jon Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe
2017 Computer Speech and Language  
Several papers describe complete robust ASR systems that exploit both front- end enhancement and back-end adaptation.  ...  Cho et al. build a feature-domain enhancement system by bringing together independent vector analysis and a model of reverberation to estimate the log-power spectrum of clean speech. bank features, also  ... 
doi:10.1016/j.csl.2017.02.007 fatcat:mt6fj5ga5zbjtgefev5zgqhcc4

Is ASR ready for wireless primetime: Measuring the core technology for selected applications

Harry M Chang
2000 Speech Communication  
available software-based ASR systems that represent the best core ASR technology on the market.  ...  Current industry trends clearly show that incorporating ASR technology into existing or new wireless services as a replacement for touch-tone input is a natural progression in user interface.  ...  Scope and focus To re¯ect emerging industry trends towards speaker independent ASR applications over both landline and wireless networks, all benchmark tests described in this paper were based on speaker  ... 
doi:10.1016/s0167-6393(99)00063-1 fatcat:n4vhya3hvje6rit4dly6ncceci

Speech-based Interaction

Cosmin Munteanu, Gerald Penn
2016 Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16  
Factors affecting ASR quality • Word Error Rate (WER) increases by a factor of 1.5 for each unfavourable condition -Accented speaker (if ASR is speaker-independent) -Temporary medical conditions  ...  factors -Acoustics (e.g. noise on the street) -CPU power (client-server vs. on-device ASR) • When designing a spoken interactive system: -Know what is against you (environment, channel, etc.)  ...  A handyman's guide to building speech interfaces • (ASR-related) Critical factors • Digitization constraints also affect ASR: -Sampling (analog-to-digital conversion) • Ideally -use a good sample  ... 
doi:10.1145/2851581.2856689 dblp:conf/chi/MunteanuP16 fatcat:c2qfv4eukjhyti7brvvzuyv6je

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
2020 Interspeech 2020  
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.  ...  Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency.  ...  Such ASR models already have great robustness against noise.  ... 
doi:10.21437/interspeech.2020-1193 dblp:conf/interspeech/WangLSWCLHLPNG20 fatcat:7bi4ldkrujg4pekpqllu4x6fpi

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition [article]

Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
2020 arXiv   pre-print
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.  ...  Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency.  ...  Such ASR models already have great robustness against noise.  ... 
arXiv:2009.04323v1 fatcat:prqvmgsek5dopm7vtkby5y2zru

Robust Speech Recognition System for Communication Robots in Real Environments

Carlos Ishi, Shigeki Matsuda, Takayuki Kanda, Takatoshi Jitsuhiro, Hiroshi Ishiguro, Satoshi Nakamura, Norihiro Hagita
2006 2006 6th IEEE-RAS International Conference on Humanoid Robots  
The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages.  ...  Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech.  ...  GWPP-based word confidence and rejection So far, several techniques were described for improving the robustness of the ASR system to noise and to speakers of different ages.  ... 
doi:10.1109/ichr.2006.321294 dblp:conf/humanoids/IshiMKJINH06 fatcat:ztbqed7zirefbk7xnjsx3lzxkq

Accent Recognition with Hybrid Phonetic Features

Zhan Zhang, Yuehai Wang, Jianyi Yang
2021 Sensors  
To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years.  ...  Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust.  ...  On the one hand, the text transcription is independent of the speaker information, and ASR MTL is suitable for this task.  ... 
doi:10.3390/s21186258 pmid:34577464 fatcat:x7wd3s4bqnay3pyyuooiepfrie

QCRI Live Speech Translation System

Fahim Dalvi, Yifan Zhang, Sameer Khurana, Nadir Durrani, Hassan Sajjad, Ahmed Abdelali, Hamdy Mubarak, Ahmed Ali, Stephan Vogel
2017 Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics  
Our Kaldi-based ASR system uses the Time Delay Neural Network architecture, while our Machine Translation (MT) system uses both phrase-based and neural frameworks.  ...  Although our neural MT system is slower than the phrase-based system, it produces significantly better translations and is memory efficient. 1  ...  The broadcast page is meant for the primary speaker. This page is responsible for recording the audio data and collecting the transcriptions and translations from the ASR and MT systems respectively.  ... 
doi:10.18653/v1/e17-3016 dblp:conf/eacl/DalviZKDSAMAV17 fatcat:wogpjwtvk5bq5ihcw7a2fgpnp4

Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]

J. Baker, Li Deng, S. Khudanpur, Chin-Hui Lee, J. Glass, N. Morgan, D. O'Shaughnessy
2009 IEEE Signal Processing Magazine  
The primary method currently used for making ASR systems more robust to variations in speaker characteristics is to include a wide range of speakers in the training.  ...  Current ASR systems assume a pronunciation lexicon that models native speakers of a language. Furthermore, they train on large amounts of speech data from various native speakers of the language.  ... 
doi:10.1109/msp.2009.932707 fatcat:obmhco466rdmnjcohfjv3th22q

Optimal Selection Of Bitstream Features For Compressed-Domain Automatic Speaker Recognition

Juan Carlos De Martin, Matteo Petracca
2006 Zenodo  
Single feature recognition performance A better criterion for feature selection is based upon the obvious fact that the goal of a speaker recognition system is to classify an unknown speaker correctly.  ...  This system has also been shown robust to packet losses in IP networks with a degradation in the recognition rate of less than 1% for a maximum frame error rate of 20%.  ... 
doi:10.5281/zenodo.39983 fatcat:mqjl7nsfj5hwroovtaxq6chfq4

KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation [article]

Suwon Shon, Hanseok Ko
2017 arXiv   pre-print
(KU-ISPL) developed speaker recognition system for SRE16 fixed training condition.  ...  Total CPU Execution time for 1 trials by systems System Total CPU time for a single trials (sec.) Task Execution time (sec.)  ...  It became a mandatory process of i-vector based speaker recognition system back-end and, moreover, recent study validated its effectiveness on domain adaptation by calculating whitening transform matrix  ... 
arXiv:1702.00956v2 fatcat:mdvqawtiqbh4pg45wvkhqwerue

Optimal Selection Of Bitstream Features For Compressed-Domain Automatic Speaker Recognition

Juan Carlos De Martin, Matteo Petracca
2006 Zenodo  
Single feature recognition performance A better criterion for feature selection is based upon the obvious fact that the goal of a speaker recognition system is to classify an unknown speaker correctly.  ...  This system has also been shown robust to packet losses in IP networks with a degradation in the recognition rate of less than 1% for a maximum frame error rate of 20%.  ... 
doi:10.5281/zenodo.52834 fatcat:qthaja2yijbttjyxupex3nkw6a
« Previous Showing results 1 — 15 out of 470 results