Filters








5,931 Hits in 6.7 sec

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition [article]

Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias Watzel, Gerhard Rigoll
2020 arXiv   pre-print
We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.  ...  Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance.  ...  In speech recognition domain, working Audio Adversarial Examples (AAEs) were already demonstrated for CTC-based [5] , as well as for attention-based ASR systems [26] .  ... 
arXiv:2007.10723v1 fatcat:prou6j5d2famxkpr3nv6qk374e

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition [article]

Iustina Andronic and Ludwig Kürzinger and Edgar Ricardo Chavez Rosas and Gerhard Rigoll and Bernhard U. Seeber
2020 arXiv   pre-print
To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system.  ...  Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification.  ...  FGSM was already applied in the context of end-to-end ASR to DeepSpeech [13, 7] , a CTC-based speech recognition system, as well as for the attention-based system called Listen-Attend-Spell (LAS) [8,  ... 
arXiv:2007.12892v1 fatcat:r3igokgsivhqvoecbiswxklphe

Adversarial Regularization for End-to-End Robust Speaker Verification

Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H.L. Hansen
2019 Interspeech 2019  
Next, we propose to train an end-toend robust SV model using the two proposed adversarial examples for model regularization.  ...  It has been shown in image as well as speech applications that deep neural networks are vulnerable to adversarial examples.  ...  Adversarial examples cannot only be used for attacking, but also can be used for improving robustness of speech recognition systems.  ... 
doi:10.21437/interspeech.2019-2983 dblp:conf/interspeech/WangGSXH19 fatcat:nanpmqzqujhodouehgtoa4ytme

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition [article]

Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko
2022 arXiv   pre-print
In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.  ...  to the current model enhancement methods against the adversarial speech examples.  ...  The authors thank Ariya Rastrow, Mat Hans and Björn Hoffmeister from Alexa AI for their valuable comments and discussion.  ... 
arXiv:2202.08532v1 fatcat:c373lqa6gfghbb6a2ktjcalgdi

AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition [article]

Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer
2019 arXiv   pre-print
In this paper, our goal is to build a unified end-to-end speech recognition system that generalizes well across accents.  ...  We further fine-tune AIPNet by connecting the accent-invariant module with an attention-based encoder-decoder model for multi-accent speech recognition.  ...  In the fine-tuning stage, we adopt an attention-based encoder-decoder model for sequence-to-sequence speech recognition.  ... 
arXiv:1911.11935v1 fatcat:do2blvazhfdntoshwwwo4dran4

Introduction to the Special Issue "Speaker and Language Characterization and Recognition: Voice Modeling, Conversion, Synthesis and Ethical Aspects"

Jean-François Bonastre, Tomi Kinnunen, Anthony Larcher, Junichi Yamagishi
2019 Computer Speech and Language  
In their article End-to-end DNN Based Text-Independent Speaker Recognition for Long and Short Utterances, Rohdin et al. proposed to mimic an i-vector/PLDA system using an end-to-end neural network to address  ...  In their article Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition, Novotný et al. report the results of a detailed analysis of speaker verification noise robustness.  ... 
doi:10.1016/j.csl.2019.101021 fatcat:mpw674uefrbuxmrfmbvvcyphwi

2019 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 27

2019 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP March 2019 496-506 Linguistics Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition.  ...  ., +, TASLP Sept. 2019 1455-1468 Gradient methods Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition.  ... 
doi:10.1109/taslp.2020.2971902 fatcat:j66uwjyqlfbmtgda6zhzlswpva

Robust Speech Recognition Using Generative Adversarial Networks [article]

Anuroop Sriram, Heewoo Jun, Yashesh Gaur, Sanjeev Satheesh
2017 arXiv   pre-print
This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition.  ...  We show the new approach improves simulated far-field speech recognition of vanilla sequence-to-sequence models without specialized front-ends or preprocessing.  ...  For speech, [18] proposes a GAN based speech enhancement method called SEGAN but without the end goal of speech recognition.  ... 
arXiv:1711.01567v1 fatcat:6poefrkl65gp5fjryydnaa2fi4

End-to-end Domain-Adversarial Voice Activity Detection [article]

Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera
2020 arXiv   pre-print
To that end, a domain classification branch is added to the network and trained in an adversarial manner.  ...  In the in-domain scenario where the training and test sets cover the exact same domains, we show that the domain-adversarial approach does not degrade performance of the proposed end-to-end model.  ...  We show that end-to-end voice activity detection leads to a significant improvement compared to models based on handcrafted features.  ... 
arXiv:1910.10655v2 fatcat:utt3bgphhzdfdbtmzred7nvjqy

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong
2019 Interspeech 2019  
We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic  ...  speech recognition.  ...  Introduction Recently, remarkable progress has been made in end-to-end (E2E) automatic speech recognition (ASR) with the advance of deep learning.  ... 
doi:10.21437/interspeech.2019-3135 dblp:conf/interspeech/MengGLG19 fatcat:e7n3np6ibraibotx5hqbgndele

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]

Jing Han, Zixing Zhang, Bjorn Schuller
2019 IEEE Computational Intelligence Magazine  
As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective  ...  o ver the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains.  ...  [43] utilized CNNs pre-trained on large amounts of image data to extract robust feature representations for speech-based emotion recognition.  ... 
doi:10.1109/mci.2019.2901088 fatcat:edkvfgy3ofgufcytngf5mktpae

Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems [article]

Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, Bo Yuan
2020 arXiv   pre-print
In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system.  ...  Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker's voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired  ...  RELATED WORK Adversarial Attack on Speech Recognition.  ... 
arXiv:2003.02301v2 fatcat:vzv2zftbtrhuxnwjcnxx4ymmty

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training [article]

Bin Liu, Shuai Nie, Yaping Zhang, Dengfeng Ke, Shan Liang, Wenju Liu1
2018 arXiv   pre-print
To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR.  ...  In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model.  ...  To the best of our knowledge, using GANs for robust speech recognition has not yet been studied, so our method is the first approach to use the adversarial training framework for robust speech recognition  ... 
arXiv:1805.01357v1 fatcat:o3wpy2tn55h3plhzdjk2fritxy

End-to-End Domain-Adversarial Voice Activity Detection

Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera
2020 Interspeech 2020  
To that end, a domain classification branch is added to the network and trained in an adversarial manner.  ...  In the in-domain scenario where the training and test sets cover the exact same domains, we show that the domain-adversarial approach does not degrade performance of the proposed end-to-end model.  ...  We would like to thank Neville Ryant for providing the speaker diarization output of the winning submission to DIHARD 2019. References  ... 
doi:10.21437/interspeech.2020-2285 dblp:conf/interspeech/LavechinGBBG20 fatcat:ox2ibrrxhjgttbqibo2c53bgkm

Adversarial Separation Network for Speaker Recognition

Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee, Jianguo Wei
2020 Interspeech 2020  
Our proposed AS-Net is featured by its ability to separate adversarial perturbation from the test speech to restore the natural clean speech.  ...  However, it is observed that DNN based systems are easily deceived by adversarial examples leading to wrong predictions.  ...  Introduction The goal of speaker recognition is to determine the identity of a person through speech. Both the safety and robustness of speaker recognition systems have attracted much attention.  ... 
doi:10.21437/interspeech.2020-1966 dblp:conf/interspeech/ZhangWZLLW20 fatcat:k7wt2xqjjjdslkddh7nb3yir44
« Previous Showing results 1 — 15 out of 5,931 results