A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
[article]
2017
arXiv
pre-print
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition ...
Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). ...
End-to-end neural systems for speech recognition typically replace the HMM with a neu-ral network that provides a distribution over sequences directly. ...
arXiv:1701.02720v1
fatcat:46c5bdvoofgmtfl33nlhysdzku
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
2016
Interspeech 2016
Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an 'end-to-end' speech recognition ...
Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). ...
End-to-end neural systems for speech recognition typically replace the HMM with a neu-ral network that provides a distribution over sequences directly. ...
doi:10.21437/interspeech.2016-1446
dblp:conf/interspeech/ZhangPBZLBC16
fatcat:ooe4fuywcfgzvaru6k67ptntfq
Deep Learning in Speech Recognition
音声認識におけるDeep Learningの活用
2017
The Brain & Neural Networks
音声認識におけるDeep Learningの活用
Acoustic Modeling from Raw Multichannel Waveforms, IEEE Automatic Speech Recognition and Understanding Workshop. 19) Graves, A., Jaitly, N. (2014): Towards Endto-End Speech Recognition with Recurrent ...
Based on Deep Learning Autoencoder with Layer-Wised Pretraining, InterSpeech, pp.1504-1507. 17) Palaz, D., Magimai-Doss, M., Collobert, R. (2015): CONVOLUTIONAL NEURAL NETWORKS-BASED CONTINUOUS SPEECH ...
doi:10.3902/jnns.24.27
fatcat:2ioqodsou5fhvnwmyi3kj2iosu
2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
., +, TASLP 2020 3010-3017 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. ...
Herzog, A., +, TASLP 2020 2461-2475 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. ...
doi:10.1109/taslp.2021.3055391
fatcat:7vmstynfqvaprgz6qy3ekinkt4
Table of Contents
2021
IEEE/ACM Transactions on Audio Speech and Language Processing
Harmonic Vector Analysis . . . . . . . . . . . . . ..Kitamura TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
Chin FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
doi:10.1109/taslp.2021.3137064
fatcat:rpka3f2bhjh37c7pkhiowyndhm
Table of Contents
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
Wang 1293 End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
Ling 839 Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
doi:10.1109/taslp.2020.3046148
fatcat:hirdphjf6zeqdjzwnwlwlamtb4
Deep learning research landscape roadmap in a nutshell: past, present and future – Towards deep cortical learning
[article]
2019
arXiv
pre-print
The past, present and future of deep learning is presented in this work. ...
Given this landscape & roadmap, we predict that deep cortical learning will be the convergence of deep learning & cortical learning which builds an artificial cortical column ultimately. ...
Imagenet classification with deep convolutional neural networks. ...
arXiv:1908.02130v1
fatcat:v3qjpjyi55ehdepjpgjohwrh24
2021 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29
2021
IEEE/ACM Transactions on Audio Speech and Language Processing
Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name. ...
., +, TASLP 2021 1785-1794 TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition. ...
., +, TASLP 2021 1290-1302 TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition. ...
doi:10.1109/taslp.2022.3147096
fatcat:7nl52k7sjfalbhpxtum3y5nmje
Convolutional Neural Networks for Raw Speech Recognition
[chapter]
2018
From Natural to Artificial Intelligence - Algorithms and Applications
Three major types of end-to-end architectures for ASR are attention-based methods, connectionist temporal classification, and convolutional neural network (CNN)-based direct raw speech model. ...
The emergence of deep learning drastically improved the recognition rate of ASR systems. Such systems are replacing traditional ASR systems. These systems can also be trained in end-to-end manner. ...
Author details Vishal Passricha and Rajesh Kumar Aggarwal* *Address all correspondence to: rka15969@gmail.com National Institute of Technology, Kurukshetra, India ...
doi:10.5772/intechopen.80026
fatcat:ni6csin5obgrpfdogpwgzjkphq
2019 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 27
2019
IEEE/ACM Transactions on Audio Speech and Language Processing
., +, TASLP Feb. 2019 244-254 Convolutional Neural Networks to Enhance Coded Speech. ...
., +, TASLP Jan. 2019 77-88 CMOS integrated circuits Convolutional Neural Networks to Enhance Coded Speech. ...
doi:10.1109/taslp.2020.2971902
fatcat:j66uwjyqlfbmtgda6zhzlswpva
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
[article]
2018
arXiv
pre-print
Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can ...
In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for ...
To this end, deep learning, which is mainly based on deep neural networks, has had a central role in the recent developments [13] - [16] . ...
arXiv:1705.10874v3
fatcat:evdhqnj7eraa5jiolakuf4mf3e
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
2018
Interspeech 2018
We tried to extract such information from spectrograms and accomplish the emotion recognition task by combining Convolutional Neural Networks (CNNs) with Recurrent Neural Networks (RNNs). ...
In this work, an approach of emotion recognition is proposed for variable-length speech segments by applying deep neutral network to spectrograms directly. ...
[17] proposed a convolutional recurrent neural network that operates on the raw signal, to perform an end-to-end spontaneous emotion prediction task from speech data. Satt et al. ...
doi:10.21437/interspeech.2018-2228
dblp:conf/interspeech/MaW0XMC18
fatcat:q7hr74umqjahde2dm5x76xtpdm
On the Importance of Video Action Recognition for Visual Lipreading
[article]
2019
arXiv
pre-print
Recently, many state-of-the-art visual lipreading methods explore the end-to-end trainable deep models, involving the use of 2D convolutional networks (e.g., ResNet) as the front-end visual feature extractor ...
Although a deep 2D convolution neural network can provide informative image-based features, it ignores the temporal motion existing between the adjacent frames. ...
Convolutional neural networks for sentence classification. EMNLP, 2014. 6
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural
networks. ...
arXiv:1903.09616v2
fatcat:27vffftd6rfbfi7gcu5lhipqdy
Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network
2016
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this paper, we propose a solution to the problem of 'context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically ...
In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing ...
INTRODUCTION AND PRIOR WORK With the advent of deep neural networks in the last decade a number of groundbreaking improvements have been observed in several established pattern recognition areas such as ...
doi:10.1109/icassp.2016.7472669
dblp:conf/icassp/TrigeorgisRBMNS16
fatcat:sssbgrhfu5doxovzqvxfyzakxm
Deep Learning for Environmentally Robust Speech Recognition
2018
ACM Transactions on Intelligent Systems and Technology
Data-driven supervised approaches, especially the ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training ...
In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for ...
Deep Learning for Environmentally Robust Speech Recognition 49:3 Fig. 1 . General framework of a speech recognition system divided into front-end and back-end. ...
doi:10.1145/3178115
fatcat:ek52sewurraitcrjpebo5ptnuy
« Previous
Showing results 1 — 15 out of 10,764 results