A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Neural Representations and Mechanisms for the Performance of Simple Speech Sequences
2010
Journal of Cognitive Neuroscience
Behavioral and Brain Sciences, 21, 499-511, 1998] that provide parallel representations of a forthcoming speech plan as well as mechanisms for interfacing these phonological planning representations with ...
The class of computational models of sequence planning and performance termed competitive queuing models have followed K. S. Lashley [The problem of serial order in behavior. In L. A. ...
Acknowledgments The authors thank Jason Tourville, Satrajit Ghosh, and Oren Civier for valuable comments. Support was provided by NIH R01 DC007683 and NIH R01 DC002852 (F. H. ...
doi:10.1162/jocn.2009.21306
pmid:19583476
pmcid:PMC2937837
fatcat:frffa5oa7rcrzbq5podgidqshq
Page 1504 of Journal of Cognitive Neuroscience Vol. 22, Issue 7
[page]
2010
Journal of Cognitive Neuroscience
Neural Representations and Mechanisms for the Performance of Simple Speech Sequences
Jason W. Bohland’, Daniel Bullock’, and Frank H. ...
The
INTRODUCTION
Here we present a neural model that describes how the brain may represent and produce sequences of simple, learned speech sounds. ...
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
[article]
2022
arXiv
pre-print
One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. ...
We introduce a simple yet novel technique that uses a cross-modal attention mechanism to extract token-level contextual embeddings from a speech encoder such that these can be directly compared and aligned ...
Acknowledgement This work was partially supported by the National Science Foundation under Grant No. 2008043. ...
arXiv:2204.05188v1
fatcat:236bdxlxvvgezfjh32dnzft2wu
Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
2018
Interspeech 2018
These models ingest time-frequency representations of speech and can be trained to discriminate between a known set speakers. ...
Our best performing embedding achieves an error rate of 3.17% using a simple cosine distance classifier. ...
A simple approach is to take the RNN hidden state corresponding to the last time step of the sequence and treat it as the sequence 'summary'. ...
doi:10.21437/interspeech.2018-1688
dblp:conf/interspeech/BhattacharyaAGK18
fatcat:hbmcyq5775foxk4hfmqptlqpsi
Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR
[article]
2017
arXiv
pre-print
This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR. ...
Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset. ...
The figure 3.1 depicts a simple sequence to sequence model with Luong's attention mechanism. ...
arXiv:1710.04515v1
fatcat:dzuhksgzejdxpi2idsecbnnmg4
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
[article]
2021
arXiv
pre-print
The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. ...
The Transformer model is widely used in natural language processing for sentence representation. ...
Acknowledgments We would like to thank the anonymous reviewers for their thoughtful and constructive comments. ...
arXiv:2107.00653v1
fatcat:kj2fsnokzfbblokzh4kshep57y
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
[article]
2020
arXiv
pre-print
As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. ...
Those black box systems, however, offer limited means for quality control as only word sequences are typically available. ...
The addition of simple features, such as word posterior probabilities and durations, provides the potential for a significantly better error mitigation mechanism to be devised. ...
arXiv:1910.11933v2
fatcat:cwiisf3mt5fbhbf4rbq3fy5qn4
Review on Text Sequence Processing with use of different Deep Neural Network Model
2019
International Journal of Advanced Trends in Computer Science and Engineering
Deep Neural Network model involves multiple processing layers to learn Sequential representations of Text data to achieve excellent performance in many domains. ...
experimental analysis for NLP tasks in terms of text and words with the help of CNN, RNN, LSTM, and GRU bidirectional Encode-Decoder. ...
Tree representation of sentences is more suitable for semantic modelling to extract more useful information from the sentence [31] . It's a very simple representation of the node structure. ...
doi:10.30534/ijatcse/2019/56852019
fatcat:gilxris5y5cvncszzfk2ckuqsy
Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder
[article]
2016
arXiv
pre-print
In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration of phoneme for mapping text and speech parameters. ...
We then use this acoustic representation at unit-level to synthesize speech using deep neural network based statistical parametric speech synthesis technique. ...
The authors like to acknowledge TCS for partially funding first authors PhD. Also authors would like to thank speech and vision lab members for participating in the listening tests. ...
arXiv:1606.05844v1
fatcat:ekk7aj5onnfktpseyndocr2k3q
Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model
2017
DEStech Transactions on Computer Science and Engineering
In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emotion-pertinent parts of an utterance. ...
Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN. ...
Corresponding author: Yawei Mu, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, P. R. China. ...
doi:10.12783/dtcse/cii2017/17273
fatcat:bczdptujbrcd3kwtbzvn2oo2im
Waveform-Based Speaker Representations for Speech Synthesis
2018
Interspeech 2018
This allows the same approach to be used for a range of tasks, but the extracted representation is unlikely to be optimal for the specific task of interest. ...
Speaker adaptation is a key aspect of building a range of speech processing systems, for example personalised speech synthesis. ...
One main reason is that this "neural vocoder" is too simple and too sensitive to the starting position of the 3200-sample sequence. ...
doi:10.21437/interspeech.2018-1154
dblp:conf/interspeech/WanDG18
fatcat:d342dvlkirbttahk275uzjtqie
Attention-Based End-to-End Speech Recognition on Voice Search
[article]
2018
arXiv
pre-print
In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on a voice search task. ...
We compare two attention mechanisms and use attention smoothing to cover long context in the attention model. ...
Automatic speech recognition (ASR) is the first step for a voice search task and thus its performance highly affects the user experience. ...
arXiv:1707.07167v3
fatcat:gijswf6ykffehk3724devpdedy
A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
[article]
2018
arXiv
pre-print
The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. ...
It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. ...
This motivates us to implement the conventional GMM and Supervector mechanism into our end-to-end LID neural network. ...
arXiv:1804.00385v1
fatcat:ptnu54y2hfhhvemyaiuhwx6hja
Page 3655 of Psychological Abstracts Vol. 85, Issue 8
[page]
1998
Psychological Abstracts
In the model, task performance is stimulated using principles and mechanisms that capture salient aspects of information processing in neural en- sembles. ...
construct representations that are well-suited to the identification of speech segments. ...
An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering
[article]
2020
arXiv
pre-print
a certain level of performance. ...
In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question ...
After that, BERT is employed to infer contextualized representations for each token in the sequence as usual, and the resulting vector for the "[CLS]" token can be viewed as a comprehensive representation ...
arXiv:2005.12142v1
fatcat:jlp6gfpv7fhatmvrhoh7o7qhjq
« Previous
Showing results 1 — 15 out of 29,842 results