Filters








22,481 Hits in 5.9 sec

An application of Artificial Intelligence in emotion detection by speech only

Arvind Sharma, RJIT Tekanpur, R K Gupta, MITS Gwalior
2021 Ymer  
In this work, a novel emotion recognition is proposed based on robust features and machine learning from only audio speech.  ...  experiments on an emotion dataset of audio speeches.  ...  In this work, a novel emotion recognition method is proposed combining NSL with the Mel Frequency Cepstrum Coefficients (MFCC) based robust features obtained from audio speech.  ... 
doi:10.37896/ymer20.10/4 fatcat:tyctksfpbvebfo3nf7yejgcfpy

A Multiscale Chaotic Feature Extraction Method for Speaker Recognition

Jiang Lin, Yi Yumei, Zhang Maosheng, Chen Defeng, Wang Chao, Wang Tonghan, Karthikeyan Rajagopal
2020 Complexity  
Then, we extracted the speech chaotic characteristics based on the nonlinear dynamic model, which helps to improve the discrimination of features.  ...  In speaker recognition systems, feature extraction is a challenging task under environment noise conditions.  ...  We extracted the nonlinear feature based on speech chaotic characteristics to improve the robustness of recognition. e experiment results show that this method is valid. erefore, we believe the speech  ... 
doi:10.1155/2020/8810901 fatcat:6vst6abjknfz7eojwd36qw3jta

Automatic Speech Recognition Features Extraction Techniques: A Multi-criteria Comparison

Maria Labied, Abdessamad Belangour
2021 International Journal of Advanced Computer Science and Applications  
a robust feature extraction technique a significant challenge for Automatic Speech Recognition.  ...  The main objective of features extraction is to identify the discriminative and robust features in the acoustic data.  ...  This method is based on the fact that the temporal properties of a speech signal environment differ from those of the speech signal.  ... 
doi:10.14569/ijacsa.2021.0120821 fatcat:yzdzhjtuy5d2neonjtgpmvevlm

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Y. KUBO, S. OKAWA, A. KUREMATSU, K. SHIRAI
2008 IEICE transactions on information and systems  
The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al.  ...  In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted.  ...  An investigation on the importance of frequency modulation (FM) is also important in order to achieve robust speech recognition.  ... 
doi:10.1093/ietisy/e91-d.3.448 fatcat:f534ekohrzflnpqlj7x47qc6gu

A novel voice activity detection based on phoneme recognition using statistical model

Xulei Bao, Jie Zhu
2012 EURASIP Journal on Audio, Speech, and Music Processing  
In this article, a novel voice activity detection (VAD) approach based on phoneme recognition using Gaussian Mixture Model based Hidden Markov Model (HMM/GMM) is proposed.  ...  We also propose a different method to demonstrate that the conventional speech enhancement method only with accurate VAD is not effective enough for automatic speech recognition (ASR) at low SNR regimes  ...  Feature extraction Different features have their own advantages in ASR system. And it is impossible to use one feature to cope with all the noisy environments.  ... 
doi:10.1186/1687-4722-2012-1 fatcat:2j2vdt352bbwdgi4pfgz73m7ei

An overview of robustness related issues in speaker recognition

Thomas Fang Zheng, Qin Jin, Lantian Li, Jun Wang, Fanhu Bie
2014 Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific  
We first categorize the robustness issues into three categories, including environment-related, speaker-related and application-oriented issues.  ...  This paper provides an overview of technologies dealing with robustness related issues in automatic speaker recognition.  ...  On the one hand, speech audio recorded from real environments often contains different types of environmental noise such as background white noise, music, or interfering speech etc.  ... 
doi:10.1109/apsipa.2014.7041826 dblp:conf/apsipa/ZhengJLWB14 fatcat:wjnsvr7eavambpucbcxqsftt6a

Voice Activity Detection. Fundamentals and Speech Recognition System Robustness [chapter]

J. Ramirez, J. M., J. C.
2007 Robust Speech Recognition and Understanding  
Acknowledgements This work has received research funding from the EU 6th Framework Programme, under contract number IST-2002-507943 (HIWIRE, Human Input that Works in Real Environments) and SR3-VoIP project  ...  The chapter has summarized three robust VAD methods that yield high speech/non-speech discrimination accuracy and improve the performance of speech recognition systems working in noisy environments.  ...  Feature extraction The objective of feature extraction process is to compute discriminative speech features suitable for detection.  ... 
doi:10.5772/4740 fatcat:wg7uhrtus5hgrfesfzmdoc5uhi

Speech Feature Extraction and Classification: A Comparative Review

Akansha Madan, Divya Gupta
2014 International Journal of Computer Applications  
of speech recognition system requires certain concepts to be included-Defining different classes of speech, techniques for speech feature extraction, speech classification modeling and measuring system  ...  This paper gives a brief survey on speech recognition and presents an overview for various techniques used at various stages of speech recognition systems.  ...  Classification of speech recognition systems based on speech Speech recognition systems can be divided into different classes based on type of speech utterances they are capable to recognize.  ... 
doi:10.5120/15603-4392 fatcat:thzlyeydfbahhaneaktxbxhuae

Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks

Qirong Mao, Ming Dong, Zhengwei Huang, Yongzhao Zhan
2014 IEEE transactions on multimedia  
Our experimental results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and language variation, and environment distortion  ...  Index Terms-Affective-salient discriminative feature analysis, convolutional neural networks, feature learning, speech emotion recognition.  ...  As a low-level feature, spectrogram is widely used in speech recognition and audio-based speaker and gender recognition.  ... 
doi:10.1109/tmm.2014.2360798 fatcat:wruzacud2veztouduc4dmrmshm

Robust speaker verification by combining MFCC and entrocy in noisy conditions

Duraid Y. Mohammed, Khamis Al-Karawi, Ahmed Aljuboori
2021 Bulletin of Electrical Engineering and Informatics  
To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments.  ...  Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method.  ...  on 15 dB SNR DET graphs for features based on 10 dB SNR DET graphs for features based on 5 dB SNR DET graphs for features based in 0 dB SNR Robust speaker verification by combining MFCC and entrocy  ... 
doi:10.11591/eei.v10i4.2957 fatcat:jkz2uhzyqbdnld6zpyxzbf5aym

Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition

Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li
2019 Interspeech 2019  
Systematic experiments on AISHELL-1 show that the proposed method improves the noise robustness of end-to-end systems and achieves the relative error rate reduction of 4.6% over the multi-condition training  ...  Recently, the end-to-end system has made significant breakthroughs in the field of speech recognition.  ...  The system consists of a mask-based enhancement network, an attention-based encoderdecoder network, a fbank feature extraction network and a discriminant network.  ... 
doi:10.21437/interspeech.2019-1242 dblp:conf/interspeech/LiuNLLYCPL19 fatcat:obprssglwrg55bl64lwgr4nc6e

Biologically inspired features used for robust phoneme recognition

Mitar Milacic, A.P. James, Sima Dimitrijev
2013 International Journal of Machine Intelligence and Sensory Signal Processing  
Formant-based research is generally focused on formant extraction, because of the assumption that a better formant extraction method is the only manner to increase the effectiveness of formants.  ...  model-based automatic speech recognition system.  ...  We propose and test a premise that biologically inspired features that are based on formants can be more effective and a better match for the human speech recognition processes.  ... 
doi:10.1504/ijmissp.2013.052867 fatcat:nxvvvcgserhf5efcekynfv7vpi

Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis

Saswati Debnath, Pinki Roy
2021 International Journal of Interactive Multimedia and Artificial Intelligence  
Based on the recognition rate combined decision is taken from the two individual recognition systems.  ...  Zernike Moment (ZM) is compared with PZM and shows that our proposed model using PZM extracts better discriminative features for visual speech recognition.  ...  This paper is primarily focused on building a speech recognition model that utilizes both audio and visual features i.e. audio-visual speech recognition based on audio-visual features and integration method  ... 
doi:10.9781/ijimai.2021.09.001 fatcat:qdcw4solzvbo5df43ycr7zemqe

A Robust Feature Extraction Method for Real-Time Speech Recognition System on a Raspberry Pi 3 Board

A. Mnassri, M. Bennasr, C. Adnane
2019 Engineering, Technology & Applied Science Research  
In this paper, a new robust feature extraction method for real-time ASR system is presented.  ...  This hybrid system can conserve more extracted speech features which tend to be invariant to noise.  ...  Feature Extraction Method for Real-Time Speech Recognition System on … Fig. 1 . 1 Fig. 1.  ... 
doi:10.48084/etasr.2533 fatcat:cweoiqk62ncnxegmclo4ujknly

IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

2014 IEEE/ACM Transactions on Audio Speech and Language Processing  
Speaker Recognition and Characterization Features and characteristics for speaker recognition; robustness to SPE-SYNT Speech  ...  discriminative, maximum-entropy and feature-based language audio de-noising and restoration; bandwidth expansion; clipping modelling; computational phonology and phonetics; dialect  ... 
doi:10.1109/taslp.2014.2311613 fatcat:gfa7t3kisnebta3um6cpzrd4ni
« Previous Showing results 1 — 15 out of 22,481 results