29,842 Hits in 5.1 sec

Neural Representations and Mechanisms for the Performance of Simple Speech Sequences

Jason W. Bohland, Daniel Bullock, Frank H. Guenther
2010 Journal of Cognitive Neuroscience  
Behavioral and Brain Sciences, 21, 499-511, 1998] that provide parallel representations of a forthcoming speech plan as well as mechanisms for interfacing these phonological planning representations with  ...  The class of computational models of sequence planning and performance termed competitive queuing models have followed K. S. Lashley [The problem of serial order in behavior. In L. A.  ...  Acknowledgments The authors thank Jason Tourville, Satrajit Ghosh, and Oren Civier for valuable comments. Support was provided by NIH R01 DC007683 and NIH R01 DC002852 (F. H.  ... 
doi:10.1162/jocn.2009.21306 pmid:19583476 pmcid:PMC2937837 fatcat:frffa5oa7rcrzbq5podgidqshq

Page 1504 of Journal of Cognitive Neuroscience Vol. 22, Issue 7 [page]

2010 Journal of Cognitive Neuroscience  
Neural Representations and Mechanisms for the Performance of Simple Speech Sequences Jason W. Bohland’, Daniel Bullock’, and Frank H.  ...  The INTRODUCTION Here we present a neural model that describes how the brain may represent and produce sequences of simple, learned speech sounds.  ... 

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems [article]

Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury
2022 arXiv   pre-print
One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks.  ...  We introduce a simple yet novel technique that uses a cross-modal attention mechanism to extract token-level contextual embeddings from a speech encoder such that these can be directly compared and aligned  ...  Acknowledgement This work was partially supported by the National Science Foundation under Grant No. 2008043.  ... 
arXiv:2204.05188v1 fatcat:236bdxlxvvgezfjh32dnzft2wu

Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification

Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta, Patrick Kenny
2018 Interspeech 2018  
These models ingest time-frequency representations of speech and can be trained to discriminate between a known set speakers.  ...  Our best performing embedding achieves an error rate of 3.17% using a simple cosine distance classifier.  ...  A simple approach is to take the RNN hidden state corresponding to the last time step of the sequence and treat it as the sequence 'summary'.  ... 
doi:10.21437/interspeech.2018-1688 dblp:conf/interspeech/BhattacharyaAGK18 fatcat:hbmcyq5775foxk4hfmqptlqpsi

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR [article]

Dan Lim
2017 arXiv   pre-print
This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR.  ...  Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset.  ...  The figure 3.1 depicts a simple sequence to sequence model with Luong's attention mechanism.  ... 
arXiv:1710.04515v1 fatcat:dzuhksgzejdxpi2idsecbnnmg4

Transformer-F: A Transformer network with effective methods for learning universal sentence representation [article]

Yu Shi
2021 arXiv   pre-print
The weight vector is obtained by the input text sequence based on the importance of the part-of-speech.  ...  The Transformer model is widely used in natural language processing for sentence representation.  ...  Acknowledgments We would like to thank the anonymous reviewers for their thoughtful and constructive comments.  ... 
arXiv:2107.00653v1 fatcat:kj2fsnokzfbblokzh4kshep57y

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks [article]

Alexandros Kastanos, Anton Ragni, Mark Gales
2020 arXiv   pre-print
As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems.  ...  Those black box systems, however, offer limited means for quality control as only word sequences are typically available.  ...  The addition of simple features, such as word posterior probabilities and durations, provides the potential for a significantly better error mitigation mechanism to be devised.  ... 
arXiv:1910.11933v2 fatcat:cwiisf3mt5fbhbf4rbq3fy5qn4

Review on Text Sequence Processing with use of different Deep Neural Network Model

Sheetal S. Pandya, RK University, School of Engineering, India
2019 International Journal of Advanced Trends in Computer Science and Engineering  
Deep Neural Network model involves multiple processing layers to learn Sequential representations of Text data to achieve excellent performance in many domains.  ...  experimental analysis for NLP tasks in terms of text and words with the help of CNN, RNN, LSTM, and GRU bidirectional Encode-Decoder.  ...  Tree representation of sentences is more suitable for semantic modelling to extract more useful information from the sentence [31] . It's a very simple representation of the node structure.  ... 
doi:10.30534/ijatcse/2019/56852019 fatcat:gilxris5y5cvncszzfk2ckuqsy

Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder [article]

Sivanand Achanta, KNRK Raju Alluri, Suryakanth V Gangashetty
2016 arXiv   pre-print
In conventional deep neural network based speech synthesis, the input text features are repeated for the entire duration of phoneme for mapping text and speech parameters.  ...  We then use this acoustic representation at unit-level to synthesize speech using deep neural network based statistical parametric speech synthesis technique.  ...  The authors like to acknowledge TCS for partially funding first authors PhD. Also authors would like to thank speech and vision lab members for participating in the listening tests.  ... 
arXiv:1606.05844v1 fatcat:ekk7aj5onnfktpseyndocr2k3q

Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model

2017 DEStech Transactions on Computer Science and Engineering  
In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emotion-pertinent parts of an utterance.  ...  Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN.  ...  Corresponding author: Yawei Mu, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, P. R. China.  ... 
doi:10.12783/dtcse/cii2017/17273 fatcat:bczdptujbrcd3kwtbzvn2oo2im

Waveform-Based Speaker Representations for Speech Synthesis

Moquan Wan, Gilles Degottex, Mark J.F. Gales
2018 Interspeech 2018  
This allows the same approach to be used for a range of tasks, but the extracted representation is unlikely to be optimal for the specific task of interest.  ...  Speaker adaptation is a key aspect of building a range of speech processing systems, for example personalised speech synthesis.  ...  One main reason is that this "neural vocoder" is too simple and too sensitive to the starting position of the 3200-sample sequence.  ... 
doi:10.21437/interspeech.2018-1154 dblp:conf/interspeech/WanDG18 fatcat:d342dvlkirbttahk275uzjtqie

Attention-Based End-to-End Speech Recognition on Voice Search [article]

Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie
2018 arXiv   pre-print
In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on a voice search task.  ...  We compare two attention mechanisms and use attention smoothing to cover long context in the attention model.  ...  Automatic speech recognition (ASR) is the first step for a voice search task and thus its performance highly affects the user experience.  ... 
arXiv:1707.07167v3 fatcat:gijswf6ykffehk3724devpdedy

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification [article]

Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li
2018 arXiv   pre-print
The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification.  ...  It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN.  ...  This motivates us to implement the conventional GMM and Supervector mechanism into our end-to-end LID neural network.  ... 
arXiv:1804.00385v1 fatcat:ptnu54y2hfhhvemyaiuhwx6hja

Page 3655 of Psychological Abstracts Vol. 85, Issue 8 [page]

1998 Psychological Abstracts  
In the model, task performance is stimulated using principles and mechanisms that capture salient aspects of information processing in neural en- sembles.  ...  construct representations that are well-suited to the identification of speech segments.  ... 

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering [article]

Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen
2020 arXiv   pre-print
a certain level of performance.  ...  In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question  ...  After that, BERT is employed to infer contextualized representations for each token in the sequence as usual, and the resulting vector for the "[CLS]" token can be viewed as a comprehensive representation  ... 
arXiv:2005.12142v1 fatcat:jlp6gfpv7fhatmvrhoh7o7qhjq
« Previous Showing results 1 — 15 out of 29,842 results