Filters








4,967 Hits in 3.8 sec

Auditory Contrast Spectrum for Robust Speech Recognition [chapter]

Xugang Lu, Jianwu Dang
2006 Lecture Notes in Computer Science  
We apply this cepstral feature for robust speech recognition experiments on AURORA-2J corpus.  ...  In this algorithm, speech is first processed using a temporal contrast processing which enhances speech temporal modulation envelopes in each auditory filter band and suppresses steady low contrast envelopes  ...  In section 3, we adapt the proposed algorithm for robust speech feature extraction and test the robustness on speech recognition task. In last section, we give some discussions and conclusions.  ... 
doi:10.1007/11939993_36 fatcat:3ynioebnc5aptjywrg5t5iw62q

Robust speaker identification using auditory features and computational auditory scene analysis

Yang Shao, DeLiang Wang
2008 Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing  
To improve robustness, we have recently proposed novel auditory features and a robust speaker recognition system using a front-end based on computational auditory scene analysis.  ...  Index Terms-Robust speaker recognition, auditory feature, Gammatone feature, Gammatone frequency cepstral coefficient, computational auditory scene analysis  ...  To tackle this robustness problem, speech enhancement methods such as spectral subtraction [7] have been explored for robust speaker recognition.  ... 
doi:10.1109/icassp.2008.4517928 dblp:conf/icassp/ShaoW08 fatcat:qso7aqc5n5f7hiwewvzjkrv7oy

Early Auditory Processing Inspired Features For Robust Automatic Speech Recognition

Ozlem Kalinli, Shrikanth Narayanan
2007 Zenodo  
In this paper, we present biologically inspired robust speech processing algorithms based on human auditory system.  ...  CONCLUSION AND FUTURE WORK In this paper, we derived bio-inspired features for automatic speech recognition based on the processing stages in the early human auditory system.  ... 
doi:10.5281/zenodo.40692 fatcat:ujff7kud35dmbivlkpzztrgudy

Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Hari Krishna Maganti, Marco Matassoni
2014 EURASIP Journal on Audio, Speech, and Music Processing  
The paper describes an auditory processing-based feature extraction strategy for robust speech recognition in environments, where conventional automatic speech recognition (ASR) approaches are not successful  ...  It incorporates a combination of gammatone filtering, modulation spectrum and non-linearity for feature extraction in the recognition chain to improve robustness, more specifically the ASR in adverse acoustic  ...  The human auditory processing system is a robust frontend for speech recognition in adverse conditions.  ... 
doi:10.1186/1687-4722-2014-21 fatcat:l23imkaodbbo7j7esliy3ln4ja

An auditory-based feature for robust speech recognition

Yang Shao, Zhaozhang Jin, DeLiang Wang, Soundararajan Srinivasan
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
We study a novel feature based on an auditory periphery model for robust speech recognition.  ...  Index Terms-Robust speech recognition, auditory feature, gammatone frequency cepstral coefficients, computational auditory scene analysis.  ...  CONCLUSIONS We have investigated a robust feature, GFCC, for speech recognition, which is derived from an auditory filterbank.  ... 
doi:10.1109/icassp.2009.4960661 dblp:conf/icassp/ShaoJWS09 fatcat:2mvirjgqufg3rkjxqjxhcurxxy

Hearing Is Believing: Biologically Inspired Methods for Robust Automatic Speech Recognition

Richard Stern, Nelson Morgan
2012 IEEE Signal Processing Magazine  
The reader is referred to [56] for a much more detailed discussion of these topics by this article's authors.  ...  The authors are grateful to Yu-Hsiang (Bosco) Chiu, Mark Harvilla, Chanwoo Kim, Kshitiz Kumar, Bhiksha Raj, and Rita Singh at CMU as well as Suman Ravuri, Bernd Meyer, and Sherry Zhao at ICSI for many  ...  - [ Biologically inspired methods for robust automatic speech recognition ] tic sources of variability.  ... 
doi:10.1109/msp.2012.2207989 fatcat:lrox67if2nbl5pmjfkkluzp5ui

IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

2018 IEEE/ACM Transactions on Audio Speech and Language Processing  
AUD-ASAP Acoustic Sensor Array Processing Far-fi eld and near-fi eld beamforming; acoustic sensor array processing; source localization and tracking; time-delay estimation; speech enhancement using acoustic  ...  enhancement.  ...  SPE-ROBU Robust Speech Recognition Acoustic features specifi cally for robust ASR (noise, channel, etc.); model/backend based robust ASR; confi dencemeasures and rejection; speech activity/end-point detection  ... 
doi:10.1109/taslp.2018.2855927 fatcat:cz5n7joepreevfgyagk7n7h544

IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

2014 IEEE/ACM Transactions on Audio Speech and Language Processing  
Speaker Recognition and Characterization Features and characteristics for speaker recognition; robustness to SPE-SYNT Speech  ...  Enhancement Spoken document retrieval; linguistic pattern discovery and Non-noisy speech; speech enhancement for humans with hearing prediction from data; spoken term detection;  ... 
doi:10.1109/taslp.2014.2311613 fatcat:gfa7t3kisnebta3um6cpzrd4ni

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
2012 International Journal of Speech Technology  
significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.  ...  However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions  ...  Auditory-inspired techniques have generally led to noticeable improvements over more 'conventional' signal processing methods for recognition tasks, particularly when dealing with distorted signals in  ... 
doi:10.1007/s10772-012-9184-y pmid:26412979 pmcid:PMC4579853 fatcat:3pdgp2tmsrbyleuw34vylnu3ay

Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems

Deepak Baby, Sarah Verhulst
2018 Interspeech 2018  
We make use of features derived from several human auditory periphery models for training a speech enhancement system that employs long short-term memory (LSTM), and evaluate them on a variety of mismatched  ...  quality measures, suggesting that such features lead to robust speech representations that are less sensitive to the noise type.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. References  ... 
doi:10.21437/interspeech.2018-1237 dblp:conf/interspeech/BabyV18 fatcat:x7trlksvabab5a6zqmslrnqpvi

The use of spike-based representations for hardware audition systems

Shih-Chii Liu, Nima Mesgarani, John Harris, Hynek Hermansky
2010 Proceedings of 2010 IEEE International Symposium on Circuits and Systems  
In this paper we describe a spiking cochlea implementation and recent experiments in both speaker and speech recognition that use spikes as input.  ...  Humans are able to process speech and other sounds effectively in adverse environments, hearing through noise, reverberation, and interference from other speakers.  ...  We believe that a better understanding of the information encoding of auditory nerve fiber spike trains will lead to more noise-robust Automatic Speech Recognition (ASR) systems.  ... 
doi:10.1109/iscas.2010.5537588 dblp:conf/iscas/LiuMHH10 fatcat:cb6xk3z6orfcfasf7r6vt2bysi

Speech Emotion Recognition With Early Visual Cross-modal Enhancement Using Spiking Neural Networks

Esma Mansouri-Benssassi, Juan Ye
2019 2019 International Joint Conference on Neural Networks (IJCNN)  
This method is inspired by the auditory information processing in the brain where auditory information is preceded, enhanced and predicted by a visual processing in multisensory audio-visual processing  ...  Speech emotion recognition (SER) is an important part of affective computing and signal processing research areas.  ...  This shows that MFCCs are an effective type for audio features for processing speech data in SNN.  ... 
doi:10.1109/ijcnn.2019.8852473 dblp:conf/ijcnn/Mansouri-Benssassi19 fatcat:by6d43nu5bfetbdrh5x6id6kuq

Biologically inspired features used for robust phoneme recognition

Mitar Milacic, A.P. James, Sima Dimitrijev
2013 International Journal of Machine Intelligence and Sensory Signal Processing  
Reference to this paper should be made as follows: Milacic, M., James, A.P. and Dimitrijev, S. (2013) 'Biologically inspired features used for robust phoneme recognition', Int.  ...  Formants are regarded as the basic building blocks of vowels; however, they are very rarely used as features for difficult automatic speech recognition tasks.  ...  We propose and test a premise that biologically inspired features that are based on formants can be more effective and a better match for the human speech recognition processes.  ... 
doi:10.1504/ijmissp.2013.052867 fatcat:nxvvvcgserhf5efcekynfv7vpi

IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

2018 IEEE/ACM Transactions on Audio Speech and Language Processing  
AUD-QIM Quality and Intelligibility Measures Perceptual measures of audio quality; objective and subjective quality assessment; network audio quality assessment; speech intelligibility measures.  ...  AUD-SARR Spatial Audio Recording and Reproduction Analysis and synthesis of sound Fields; wave-fi eld synthesis; loudspeaker array processing; ambisonics; panning; multipoint synthesis and binaural synthesis  ...  SPE-ROBU Robust Speech Recognition Acoustic features specifi cally for robust ASR (noise, channel, etc.); model/backend based robust ASR; confi dencemeasures and rejection; speech activity/end-point detection  ... 
doi:10.1109/taslp.2018.2830714 fatcat:mpfynd2q5vc3jocslih7eg4sae

IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

2017 IEEE/ACM Transactions on Audio Speech and Language Processing  
AUD-QIM Quality and Intelligibility Measures Perceptual measures of audio quality; objective and subjective quality assessment; network audio quality assessment; speech intelligibility measures.  ...  AUD-SARR Spatial Audio Recording and Reproduction Analysis and synthesis of sound Fields; wave-fi eld synthesis; loudspeaker array processing; ambisonics; panning; multipoint synthesis and binaural synthesis  ...  SPE-ROBU Robust Speech Recognition Acoustic features specifi cally for robust ASR (noise, channel, etc.); model/backend based robust ASR; confi dencemeasures and rejection; speech activity/end-point detection  ... 
doi:10.1109/taslp.2017.2772589 fatcat:idirdv5etbh5djcync5obdmhsq
« Previous Showing results 1 — 15 out of 4,967 results