Filters








10,764 Hits in 4.4 sec

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks [article]

Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville
2017 arXiv   pre-print
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition  ...  Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR).  ...  End-to-end neural systems for speech recognition typically replace the HMM with a neu-ral network that provides a distribution over sequences directly.  ... 
arXiv:1701.02720v1 fatcat:46c5bdvoofgmtfl33nlhysdzku

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron Courville
2016 Interspeech 2016  
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an 'end-to-end' speech recognition  ...  Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR).  ...  End-to-end neural systems for speech recognition typically replace the HMM with a neu-ral network that provides a distribution over sequences directly.  ... 
doi:10.21437/interspeech.2016-1446 dblp:conf/interspeech/ZhangPBZLBC16 fatcat:ooe4fuywcfgzvaru6k67ptntfq

Deep Learning in Speech Recognition
音声認識におけるDeep Learningの活用

Ken-ichi Iso
2017 The Brain & Neural Networks  
Acoustic Modeling from Raw Multichannel Waveforms, IEEE Automatic Speech Recognition and Understanding Workshop. 19) Graves, A., Jaitly, N. (2014): Towards Endto-End Speech Recognition with Recurrent  ...  Based on Deep Learning Autoencoder with Layer-Wised Pretraining, InterSpeech, pp.1504-1507. 17) Palaz, D., Magimai-Doss, M., Collobert, R. (2015): CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH  ... 
doi:10.3902/jnns.24.27 fatcat:2ioqodsou5fhvnwmyi3kj2iosu

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP 2020 3010-3017 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.  ...  Herzog, A., +, TASLP 2020 2461-2475 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

Table of Contents

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
Harmonic Vector Analysis . . . . . . . . . . . . . ..Kitamura TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  Chin FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ... 
doi:10.1109/taslp.2021.3137064 fatcat:rpka3f2bhjh37c7pkhiowyndhm

Table of Contents

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Wang 1293 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  Ling 839 Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ... 
doi:10.1109/taslp.2020.3046148 fatcat:hirdphjf6zeqdjzwnwlwlamtb4

Deep learning research landscape roadmap in a nutshell: past, present and future – Towards deep cortical learning [article]

Aras R. Dargazany
2019 arXiv   pre-print
The past, present and future of deep learning is presented in this work.  ...  Given this landscape & roadmap, we predict that deep cortical learning will be the convergence of deep learning & cortical learning which builds an artificial cortical column ultimately.  ...  Imagenet classification with deep convolutional neural networks.  ... 
arXiv:1908.02130v1 fatcat:v3qjpjyi55ehdepjpgjohwrh24

2021 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TASLP 2021 1785-1794 TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition.  ...  ., +, TASLP 2021 1290-1302 TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition.  ... 
doi:10.1109/taslp.2022.3147096 fatcat:7nl52k7sjfalbhpxtum3y5nmje

Convolutional Neural Networks for Raw Speech Recognition [chapter]

Vishal Passricha, Rajesh Kumar Aggarwal
2018 From Natural to Artificial Intelligence - Algorithms and Applications  
Three major types of end-to-end architectures for ASR are attention-based methods, connectionist temporal classification, and convolutional neural network (CNN)-based direct raw speech model.  ...  The emergence of deep learning drastically improved the recognition rate of ASR systems. Such systems are replacing traditional ASR systems. These systems can also be trained in end-to-end manner.  ...  Author details Vishal Passricha and Rajesh Kumar Aggarwal* *Address all correspondence to: rka15969@gmail.com National Institute of Technology, Kurukshetra, India  ... 
doi:10.5772/intechopen.80026 fatcat:ni6csin5obgrpfdogpwgzjkphq

2019 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 27

2019 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP Feb. 2019 244-254 Convolutional Neural Networks to Enhance Coded Speech.  ...  ., +, TASLP Jan. 2019 77-88 CMOS integrated circuits Convolutional Neural Networks to Enhance Coded Speech.  ... 
doi:10.1109/taslp.2020.2971902 fatcat:j66uwjyqlfbmtgda6zhzlswpva

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments [article]

Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller
2018 arXiv   pre-print
Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can  ...  In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for  ...  To this end, deep learning, which is mainly based on deep neural networks, has had a central role in the recent developments [13] - [16] .  ... 
arXiv:1705.10874v3 fatcat:evdhqnj7eraa5jiolakuf4mf3e

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms

Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
2018 Interspeech 2018  
We tried to extract such information from spectrograms and accomplish the emotion recognition task by combining Convolutional Neural Networks (CNNs) with Recurrent Neural Networks (RNNs).  ...  In this work, an approach of emotion recognition is proposed for variable-length speech segments by applying deep neutral network to spectrograms directly.  ...  [17] proposed a convolutional recurrent neural network that operates on the raw signal, to perform an end-to-end spontaneous emotion prediction task from speech data. Satt et al.  ... 
doi:10.21437/interspeech.2018-2228 dblp:conf/interspeech/MaW0XMC18 fatcat:q7hr74umqjahde2dm5x76xtpdm

On the Importance of Video Action Recognition for Visual Lipreading [article]

Xinshuo Weng
2019 arXiv   pre-print
Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor  ...  Although a deep 2D convolution neural network can provide informative image-based features, it ignores the temporal motion existing between the adjacent frames.  ...  Convolutional neural networks for sentence classification. EMNLP, 2014. 6 [19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks.  ... 
arXiv:1903.09616v2 fatcat:27vffftd6rfbfi7gcu5lhipqdy

Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Bjorn Schuller, Stefanos Zafeiriou
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we propose a solution to the problem of 'context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically  ...  In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing  ...  INTRODUCTION AND PRIOR WORK With the advent of deep neural networks in the last decade a number of groundbreaking improvements have been observed in several established pattern recognition areas such as  ... 
doi:10.1109/icassp.2016.7472669 dblp:conf/icassp/TrigeorgisRBMNS16 fatcat:sssbgrhfu5doxovzqvxfyzakxm

Deep Learning for Environmentally Robust Speech Recognition

Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller
2018 ACM Transactions on Intelligent Systems and Technology  
Data-driven supervised approaches, especially the ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training  ...  In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for  ...  Deep Learning for Environmentally Robust Speech Recognition 49:3 Fig. 1 . General framework of a speech recognition system divided into front-end and back-end.  ... 
doi:10.1145/3178115 fatcat:ek52sewurraitcrjpebo5ptnuy
« Previous Showing results 1 — 15 out of 10,764 results