Filters








6,281 Hits in 5.5 sec

Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition

Meet Soni, Ashish Panda
2019 Interspeech 2019  
The application of Time-Frequency (T-F) masking based approaches for Automatic Speech Recognition has been shown to provide significant gains in system performance in the presence of additive noise.  ...  However, such systems still rely on a pre-trained T-F masking enhancement block, trained using pairs of clean and noisy speech signals.  ...  Aurora-4 is a medium vocabulary database used for noise robust continuous speech recognition task.  ... 
doi:10.21437/interspeech.2019-2172 dblp:conf/interspeech/SoniP19 fatcat:6rr7oyhl6vhaze5zyft67py6na

Robust speech recognition by integrating speech separation and hypothesis testing

Soundararajan Srinivasan, DeLiang Wang
2010 Speech Communication  
Missing data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time-frequency domain.  ...  Such methods require a binary mask which labels time-frequency regions of a noisy speech signal as reliable if they contain more speech energy than noise energy and unreliable otherwise.  ...  Barker for help with the speech fragment decoder.  ... 
doi:10.1016/j.specom.2009.08.008 fatcat:tntp3afhnrhfjj3ahzqznjhyca

Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Jon Barker, Ning Ma, André Coy, Martin Cooke
2010 Computer Speech and Language  
A speech fragment decoder is used which employs missing data techniques and clean speech models to simultaneously search for the set of fragments and the word sequence that best matches the target speaker  ...  We review a speech fragment decoding technique that treats segregation and recognition as coupled problems.  ...  , although adequate for discriminating a small number of artificial stationary vowels, are not suitable for continuous speech recognition even when vocabulary sizes are small.  ... 
doi:10.1016/j.csl.2008.05.003 fatcat:tsnqer4rtbegzmsib2gpsifnk4

Decoding speech in the presence of other sources

J.P. Barker, M.P. Cooke, D.P.W. Ellis
2005 Speech Communication  
The maturation of statistical pattern recognition techniques has brought us very low word error rates when the training and test material both consist solely of speech.  ...  However, in real-world situations, any speech  ...  In this paper, we present a framework which attempts to integrate sourceand model-driven processes in robust speech recognition.  ... 
doi:10.1016/j.specom.2004.05.002 fatcat:lx4mcr5pwre4dfvbb6lem3u6hi

Table of Contents

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
and Continuous Speech Separation . . . . . . . . . . . . . . . . .Chen The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech . . . . . . . . . .Rafaely  ...  Masking and Its Application to Harmonic Vector Analysis . . . . . . . . . . . . . ..Kitamura TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition . . . . . . . . . . . .  ... 
doi:10.1109/taslp.2021.3137064 fatcat:rpka3f2bhjh37c7pkhiowyndhm

A schema-based model for phonemic restoration

Soundararajan Srinivasan, DeLiang Wang
2005 Speech Communication  
The model employs a missing data speech recognition system to decode speech based on intact portions and activates word templates corresponding to the words containing the masked phonemes.  ...  An activated template is dynamically time warped to the noisy word and is then used to restore the speech frames corresponding to the masked phoneme, thereby synthesizing it.  ...  Masuda-Katsuse for their assistance in helping us implement their models. We also wish to thank the two anonymous reviewers for their constructive suggestions/criticisms.  ... 
doi:10.1016/j.specom.2004.09.002 fatcat:qz6j6vw66rfqtirk7622tncz6i

Computational Auditory Scene Analysis and Automatic Speech Recognition [chapter]

Arun Narayanan, Deliang Wang
2012 Techniques for Noise Robustness in Automatic Speech Recognition  
Continuity refers to the continuity of pitch (perceived fundamental frequency), spectral and temporal continuity, etc. Continuity or smooth transitions can be used to group segments across time.  ...  For added robustness, the tandem algorithm trains an MLP to perform unit labeling based on a neighboring set of T-F units.  ... 
doi:10.1002/9781118392683.ch16 fatcat:gfllmc5rdfhaph3aqcw3j5on5e

Using channel-specific models to detect and remove reverberation in cochlear implants

Jill M. Desmond, Chandra S. Throckmorton, Leslie M. Collins
2012 Journal of the Acoustical Society of America  
Because the reverberation mitigation strategy did not consistently improve speech recognition, future work is required to analyze the effects of algorithm-specific parameters for CI listeners. vi  ...  rapidly for CI listeners than for normal hearing listeners in noisy and reverberant environments [1].  ...  These two energy-based features were developed to discriminate the exponentially decaying overlap-masking time segments from the speech segments that are driven by active speech energy.  ... 
doi:10.1121/1.4755544 fatcat:on7rcafswrc63a2ol3ri4znxdy

An Overview of Speaker Identification: Accuracy and Robustness Issues

Roberto Togneri, Daniel Pullella
2011 IEEE Circuits and Systems Magazine  
Constructing these labels in the form of a time-frequency reliability mask allows robust recognition to be performed via a reconstruction of the speech spectrogram, or by integrating over the noise dominated  ...  (c) Also shown is the reliability mask containing (ideal) reliability labels for time-frequency regions in the utterance. (a) Clean spectrogram. (b) Speech and noise spectrogram.  ...  Noisy Testing Speech Framed Time-Domian Signal  ... 
doi:10.1109/mcas.2011.941079 fatcat:jnp75b7tjvaq5f3jfyoroolmuy

An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks [article]

Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, and Eliathamby Ambikairajah
2019 arXiv   pre-print
In this paper, we propose a neural encoding and decoding scheme that is optimized for speech processing.  ...  We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on speech recognition experiments.  ...  For SNN study to progress from isolated word recognition towards continuous speech recognition, a continuous speech database is required.  ... 
arXiv:1909.01302v2 fatcat:k4iqxvkvwjfvvbgvtzpykoiktu

Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Daniel P.W. Ellis
1999 Speech Communication  
One such architecture, the 'prediction-driven' approach, is presented along with results from its initial implementation.  ...  Phenomena such as the continuity illusion and phonemic restoration show that the brain is able to use a wide range of knowledge-based contextual constraints when interpreting obscured or complex mixtures  ...  My thanks to the ICSI Realization group for their patience in introducing me to speech recognition.  ... 
doi:10.1016/s0167-6393(98)00083-1 fatcat:57y4ux5xgnhgzimlv32okn5v34

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah
2020 Frontiers in Neuroscience  
We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on sound classification and speech recognition experiments.  ...  Finally, we also built and published two spike-version of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.  ...  For SNN study to progress from isolated word recognition toward continuous speech recognition, a continuous speech database is required.  ... 
doi:10.3389/fnins.2019.01420 pmid:32038132 pmcid:PMC6987407 fatcat:ijs7znxpxfcd5bnjoowxfychza

Table of Contents

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Liu 1428 Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding and E.  ...  Wang 2109 (Contents Continued on Page vi) 671 Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language 852 Cognitive-Driven Binaural Beamforming Using EEG-Based  ... 
doi:10.1109/taslp.2020.3046148 fatcat:hirdphjf6zeqdjzwnwlwlamtb4

EarCommand

Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, Zhanpeng Jin
2022 Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies  
Our system can achieve a WER (word error rate) of 10.02% for word-level recognition and 12.33% for sentence-level recognition, when tested in human subjects with 32 word-level commands and 25 sentence-level  ...  Moreover, EarCommand shows high reliability and robustness in a variety of configuration settings and environmental conditions.  ...  [28] proposed a smart mask that utilizes the attached acceleration and angular velocity sensor for silent speech recognition. But there are two limitations for this work.  ... 
doi:10.1145/3534613 fatcat:qym3kfi5vrb4lcesgov42s7esq

Sparse imputation for large vocabulary noise robust ASR

Jort Florent Gemmeke, Bert Cranen, Ulpu Remes
2011 Computer Speech and Language  
An effective way to increase noise robustness in automatic speech recognition is to label the noisy speech features as either reliable or unreliable ('missing'), and replace ('impute') the missing ones  ...  We achieved substantial gains in performance at low SNR's for a connected digit recognition task.  ...  Discussion Sparse imputation for large vocabulary continuous speech recognition Research on noise robust MDT started out with experiments on small vocabulary tasks artificially corrupted by noise [10  ... 
doi:10.1016/j.csl.2010.06.004 fatcat:cwkx3zebjngk5fuateguwqikrm
« Previous Showing results 1 — 15 out of 6,281 results