482 Hits in 5.3 sec

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech.  ...  For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor.  ...  the end-to-end training.  ... 
doi:10.24963/ijcai.2020/514 dblp:conf/ijcai/ChowdhuryK020 fatcat:efhlairzynaarj5fdamfx7xe7q

Infobox-to-text Generation with Tree-like Planning based Attention Network

Yang Bai, Ziran Li, Ning Ding, Ying Shen, Hai-Tao Zheng
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Representing the input infobox as a sequence, previous neural methods using end-to-end models without order-planning suffer from the problems of incoherence and inadaptability to disordered input.  ...  A novel tree-like tuning encoder is designed to dynamically tune the static order-plan for better planning by merging the most relevant attributes together layer by layer.  ...  Conclusion In this work, we presented the novel method which employs flow-based density estimation for robust multi-channel ASR.  ... 
doi:10.24963/ijcai.2020/518 dblp:conf/ijcai/KimLKKK20 fatcat:nahfro4i3vaspcfv3ted5thgdi

Exploiting Single-Channel Speech For Multi-channel End-to-end Speech Recognition [article]

Keyu An, Zhijian Ou
2021 arXiv   pre-print
Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition.  ...  This paper explores the usage of single-channel data to improve the multi-channel end-to-end speech recognition system.  ...  MULTI-CHANNEL END-TO-END SPEECH RECOGNITION We adopt a unified architecture for multi-channel end-to-end speech recognition, and apply joint optimization for front-end and back-end.  ... 
arXiv:2107.02670v1 fatcat:bkhhunasnnbgjcgv3524qoo4hm

Far-Field Automatic Speech Recognition [article]

Reinhold Haeb-Umbach
2020 arXiv   pre-print
A signal enhancement front-end for dereverberation, source separation and acoustic beamforming is employed to clean up the speech, and the back-end ASR engine is robustified by multi-condition training  ...  We will also describe the so-called end-to-end approach to ASR, which is a new promising architecture that has recently been extended to the far-field scenario.  ...  The gradient can flow from the AM to the speech enhancement front-end, which enables optimization of the front-end for ASR.  ... 
arXiv:2009.09395v1 fatcat:7de7w2i5jfhehhtfflu72k35mi

Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario

Hangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu
2020 Interspeech 2020  
Meanwhile, we propose a data augmentation technique via random channel selection and deep convolutional neural network-based multi-channel acoustic models for back-end modeling.  ...  In the paper, we make an effort to integrate an improved GSS with a strong automatic speech recognition (ASR) backend, which bridges the WER gap and achieves substantial ASR performance improvement.  ...  Multi-channel separation with an improved GSS The GSS conducts mask estimation and beamforming with the provided time annotation or the ASR alignment.  ... 
doi:10.21437/interspeech.2020-1606 dblp:conf/interspeech/ChenZSL20 fatcat:n5sxggkry5bu5c23ymfae2mw4q

Articulatory Information for Noise Robust Speech Recognition

Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein
2011 IEEE Transactions on Audio, Speech, and Language Processing  
In this paper, we first estimated articulatory information in the form of vocal tract constriction variables (abbreviated as TVs) from the Aurora-2 speech corpus using a neural network based speech-inversion  ...  Word recognition tasks were then performed for both noisy and clean speech using articulatory information in conjunction with traditional acoustic features.  ...  Noise Robustness in ASR Several approaches have been proposed to incorporate noise robustness into ASR systems, which can be broadly grouped into three categories: 1) the front-end based approach; 2) the  ... 
doi:10.1109/tasl.2010.2103058 fatcat:3kxf7ul4nrgbxomgpvahldmfem

Automatic Speech Recognition: Systematic Literature Review

Sadeen Alharbi, Muna Alrazgan, Alanoud Alrashed, Turkiah AlNomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, Maha Almojil
2021 IEEE Access  
ACKNOWLEDGMENT The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support.  ...  In [70] , the researchers proposed a robust technique that parameterized ambient noise and pitch variations at the front end of speech.  ...  The researchers in [29] proposed a front-end speech parameterization technique that is robust with respect to both noise and pitch variations.  ... 
doi:10.1109/access.2021.3112535 fatcat:uhyhmyd6b5d2lldkhf6tihnxky

A Front-End Technique for Automatic Noisy Speech Recognition

Hay Mar Soe Naing, Risanuri Hidayat, Rudy Hartanto, Yoshikazu Miyanaga
2020 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)  
Experiments are carried out on the Aurora-2 database, and frame-level cross entropy-based deep neural network (DNN-HMM) training is used to build an acoustic model.  ...  Moreover, the Gammatone frequency integration is presented to warp the energy spectrum which can provide gradually decaying the weights and compensate for the loss of spectral correlation.  ...  for SCOPE Program (185001003).  ... 
doi:10.1109/o-cocosda50338.2020.9295006 fatcat:n4f5kuurond2tb4iyiitgijmdq

Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation

Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The estimation of an accurate noise model is a crucial problem for model-based noise suppression including a vector Taylor series (VTS)-based approach.  ...  Although VTS-based approach involves nonlinear transformation, the MMSE estimates make it possible to flexibly estimate accurate parameters for the joint processing without the influences of non-linear  ...  As the front-end processing of ASR, robust feature extraction [1] and noise suppression [2] - [6] reduce the influence of interfering noise from observed noisy speech signals.  ... 
doi:10.1109/icassp.2012.6288971 dblp:conf/icassp/FujimotoWN12 fatcat:7amb5jzk6zdcvgis6lujbuvpwu

Multichannel End-to-end Speech Recognition [article]

Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
2017 arXiv   pre-print
In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network.  ...  Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components.  ...  with 5-th channel as a fixed reference microphone, which is located on the center front of the tablet device. and MASK NET (ATT) validates the use of the attention-based mechanism for reference selection  ... 
arXiv:1703.04783v1 fatcat:zjwcmk4d35ddtpo7nqutyczdse

A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities

Muhammad Mohsin Kabir, M. F. Mridha, Jungpil Shin, Israt Jahan, Abu Quwsar Ohi
2021 IEEE Access  
implemented in the front-end of a multi-stage speaker recognition system (from [10] ).  ...  A stage-wise ASR algorithm commonly compositions of a front-end for the feature extraction and a back-end for the speaker features similarity computation.  ... 
doi:10.1109/access.2021.3084299 fatcat:6eavwhxg6jfwngu7bnwzjc4w3q

Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous Vehicles [article]

R. Spencer Hallyburton, Yupei Liu, Yulong Cao, Z. Morley Mao, Miroslav Pajic
2022 arXiv   pre-print
Sensor fusion with multi-frame tracking is becoming increasingly popular for detecting 3D objects.  ...  Finally, we show that the frustum attack can be exercised consistently over time to form stealthy longitudinal attack sequences, compromising the tracking module and creating adverse outcomes on end-to-end  ...  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection.  ... 
arXiv:2106.07098v4 fatcat:onalm73kcjdunoeqbzv4sscj7m

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Alexandru-Lucian Georgescu, Alessandro Pappalardo, Horia Cucu, Michaela Blott
2021 EURASIP Journal on Audio, Speech, and Music Processing  
ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the  ...  raw waveform directly into words using one deep neural network (DNN).  ...  Authors' contributions ALG was responsible for summarizing the methods and writing the manuscript. AP provided guidance for ALG in order to perform the experiments and the subsequent analyzes.  ... 
doi:10.1186/s13636-021-00217-4 fatcat:7yfquu7irrci3ewug6ijoseduq

Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting

Martin Wöllmer, Erik Marchi, Stefano Squartini, Björn Schuller
2011 Cognitive Neurodynamics  
In this article, we propose to combine histogram equalization and multi-condition training for robust keyword detection in noisy speech.  ...  To better cope with conversational speaking styles, we show how contextual information can be effectively exploited in a multi-stream ASR framework that dynamically models context-sensitive phoneme estimates  ...  reached when HEQ is included in the front-end of the recognition system.  ... 
doi:10.1007/s11571-011-9166-9 pmid:22942915 pmcid:PMC3179540 fatcat:lfqo5jvmavgfdasr3my2j4xjvi

Automatic speech recognition and speech variability: A review

M. Benzeghiba, R. De Mori, O. Deroo, S. Dupont, T. Erbes, D. Jouvet, L. Fissore, P. Laface, A. Mertins, C. Ris, R. Rose, V. Tyagi (+1 others)
2007 Speech Communication  
For instance, the lack of robustness to foreign accents precludes the use by specific populations.  ...  Major progress is being recorded regularly on both the technology and exploitation of Automatic Speech Recognition (ASR) and spoken language systems.  ...  The Community is not liable for any use that may be made of the information contained therein.  ... 
doi:10.1016/j.specom.2007.02.006 fatcat:3zfqokxrpretxa4ght2yfudnwi
« Previous Showing results 1 — 15 out of 482 results