398 Hits in 5.5 sec

Speech emotion Recognition using Neural Networks

2019 International journal of recent technology and engineering  
Speech emotion recognition is one the area which can be used to identify the emotions from verbal expression of human.  ...  Implementation of Speech Emotion Recognition may involve several learning models, classification methods, feature extraction and pattern recognition.  ...  The spectral and cepstral features are Mel frequency cepstral coefficient (MFCC),linear prediction cepstral coefficient (LPCC) and acoustic features like pitch,energy, formants are also used in speech  ... 
doi:10.35940/ijrte.b1432.0982s1119 fatcat:zxocwlr2tbel3n7i5iombaeinq

Deep Learning Approach for Spoken Digit Recognition in Gujarati Language

Jinal H. Tailor, Rajnish Rakholia, Jatinderkumar R. Saini, Ketan Kotecha
2022 International Journal of Advanced Computer Science and Applications  
To implement a deep learning approach, Convolutional Neural Network (CNN) with MFCC is used to analyze audio clips to generate spectrograms.  ...  With this approach maximum 98.7% accuracy is achieved for spoken digits in Gujarati language with 98% Precision and 98% Recall.  ...  Sen [10] proposed a framework for digit recognition using neural network. For feature extraction they have used Mel Frequency Cepstral Coefficient (MFCC) and Filter Banks (FB) coefficients.  ... 
doi:10.14569/ijacsa.2022.0130450 fatcat:yjaajwbuzfdtjg2jrbrcf22sje

Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks

Abeer Ali Alnuaim, Mohammed Zakariah, Aseel Alhadlaq, Chitra Shashidhar, Wesam Atef Hatamleh, Hussam Tarazi, Prashant Kumar Shukla, Rajnish Ratna, Vijay Kumar
2022 Computational Intelligence and Neuroscience  
The suggested classification model, a 1D convolutional neural network (1D CNN), outperforms traditional machine learning approaches in classification.  ...  Emotions play an essential role in human relationships, and many real-time applications rely on interpreting the speaker's emotion from their words.  ...  [46] also used a recurrent neural network (RNN) to extract relationships from 3D spectrograms across timesteps and frequencies. Lee et al.  ... 
doi:10.1155/2022/7463091 pmid:35401731 pmcid:PMC8989588 fatcat:aeyjth2glfe6vpffkxejvdbdke

Speech Recognition using Multiscale Scattering of Audio Signals and Long Short-Term Memory of Neural Networks

This method provides an increased accuracy than other standard methods that uses Melfrequency Cepstral coefficients (MFFC) and LSTM network to recognize digits.  ...  In order to understand the audio language used by humans, machines use different techniques to convert speech to machine readable form called speech recognition.  ...  There are various time-frequency representations to measure energy (or power) from a signal like Mel Frequency Cepstral Coefficients (MFCC), Fourier-based coefficients, wavelet scattering coefficients  ... 
doi:10.35940/ijitee.k2270.0981119 fatcat:ex7zwzgwzbhllmltn7wdgqag4y

Emotion Recognition with Capsule Neural Network

Loan Trinh Van, Quang H. Nguyen, Thuy Dao Thi Le
2022 Computer systems science and engineering  
Among the models and the classifiers used to recognize emotions, neural networks appear to be promising due to the network's ability to learn and the diversity in configuration.  ...  Following the convolutional neural network, a capsule neural network (CapsNet) with inputs and outputs that are not scalar quantities but vectors allows the network to determine the part-whole relationships  ...  A capsule network for low resource spoken language understanding was proposed for commandand-control applications in [5] .  ... 
doi:10.32604/csse.2022.021635 fatcat:odikrqtbovhc3kerlqyw3vgxta

Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network

María Teresa García-Ordás, Héctor Alaiz-Moretón, José Alberto Benítez-Andrades, Isaías García-Rodríguez, Oscar García-Olalla, Carmen Benavides
2021 Biomedical Signal Processing and Control  
Mel spectrogram and Mel Frequency Cepstral Coefficients are used as audio description methods and a Fully Convolutional Neural Network architecture is proposed as a classifier.  ...  The results have been validated using three well known datasets: EMODB, RAVDESS and TESS. The results obtained were promising, outperforming the state-of-the-art methods.  ...  Mel-frequency cepstral coefficients (MFCCs) are coefficients derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum") and it is the most used representation  ... 
doi:10.1016/j.bspc.2021.102946 fatcat:wdlxldvtubeu5j76qtkex74fwi

Arabic Speech Emotion Recognition from Saudi Dialect Corpus

Reem H. Aljuhani, Areej Alshutayri, Shahd Alahdal
2021 IEEE Access  
The first model combined a convolutional neural networks (CNN), bi-directional long short-term memory (BLSTM), and deep neural networks (DNN) for the attention-based CNN-LSTM-DNN model, and the second  ...  Researchers have applied machine learning algorithms to detect emotions from English speech, such as [11] , which used the SVM as a classifier to train the data and applied mel-frequency cepstral coefficients  ... 
doi:10.1109/access.2021.3110992 fatcat:c73knoukoradles6fmny6sgffq

Speech Emotion Recognition Systems: Review

Pranay Kumar Rahi
2020 International Journal for Research in Applied Science and Engineering Technology  
In Emotion Detection, domain has two types of features i.e. Important Utterance and Prosodic features.  ...  In human machine interface application, emotion recognition from the speech signal has been research topic since many years.  ...  After choosing the useful features such as Mel-Frequency Cepstral Coefficients (MFCC) and its transient parameters, a better performance with the application of Back Propagation Neural Networks (BPNNs)  ... 
doi:10.22214/ijraset.2020.1007 fatcat:ih3klw2nsneo5dpjfgdkzw6nei

Isolated Telugu Speech Recognition On TDSCC And DNN Techniques

This research recognizes speaker independent data which gives good results by using TDSCC (Teager energy operator delta spectral cepstral coefficients) feature extraction technique and DNN (Deep Neural  ...  Networks) feature classification technique.  ...  Mel Frequency Cepstral Coefficients (MFCC) features are most commonly used for speech as well as emotion recognition obtaining a good appreciation rate.  ... 
doi:10.35940/ijitee.k2544.0981119 fatcat:uler6m6uhfa6zbhwdmdwk2ebti


Kanika Garg .
2016 International Journal of Research in Engineering and Technology  
So, this paper deals with the various speech features that can used for Hindi speech that has been tested for many other languages.  ...  In this work, MFCC, PLP, EFCC and LPC have been tested against Hindi Speech Corpus using HMM toolkit HTK 3.4.1. These features have been evaluated using common environment.  ...  Acoustic models can be implemented using Hidden Markov Models (HMM), Support Vector Machines (SVM), Deep Neural Networks (DNN) etc.  ... 
doi:10.15623/ijret.2016.0507058 fatcat:fsa3icolgrbovnrccnaijpqpdu

Deep Multimodal Learning for Emotion Recognition in Spoken Language [article]

Yue Gu, Shuhong Chen, Ivan Marsic
2018 arXiv   pre-print
Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global  ...  In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics.  ...  ACKNOLEDGEMENTS We would like to thank reviewers for the valuable feedback and the SAIL-USC for providing us the IEMOCAP dataset.  ... 
arXiv:1802.08332v1 fatcat:hyvzt6wrnbdedir3m3y7ld7fym


Ameya Ajit Mande, Mechanical Engineering Department Maharashtra Institute of Technology Aurangabad
2019 International Journal of Advanced Research in Computer Science  
Several machine learning algorithms including K-nearest neighbours (KNN) and decision trees were implemented, based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC).  ...  Our evaluation shows that the proposed approach yields accuracies of 98%, 92% and 99% using KNN, Decision Trees and Extra-Tree Classifiers, respectively, for 7 emotions using Toronto Emotional Speech Set  ...  are extracted from the audio samples: Zero Crossing Rate (ZCR), Mel Frequency Cepstral Coefficient (MFCC), Tonnetz, Contrast, Mel, Chroma.  ... 
doi:10.26483/ijarcs.v10i6.6489 fatcat:pbc5qb5c3ff4vj4k75mqbam4hm

Speech Emotion Recognition System Using Recurrent Neural Network in Deep Learning

Siddhant S. Patil, Shruti K. Patil, Ishwari S. Chankeshwara, Hrishikesh S. Rapatwar, Prof. Vidya V. Waykule
2022 International Journal for Research in Applied Science and Engineering Technology  
Keywords: Deep Learning, Recurrent Neural Networks, Emotion Recognition, Speech Recognition, SER, RNN, Catatonia.  ...  In this context, we also present an approach of using the Recurrent Neural Network which is a part of Deep learning algorithms.  ...  Also, time-dependent acoustic features, different spectral features similarly as linear predictor coefficients (LPC), linear predictor cepstral coefficients (LPCC), and Mel-frequency cepstral coefficients  ... 
doi:10.22214/ijraset.2022.41112 fatcat:gbg7jfik6rff3k23inl6hvqsfa

A Review on Automatic Speech Recognition Architecture and Approaches

Karpagavalli S, Chandra E
2016 International Journal of Signal Processing, Image Processing and Pattern Recognition  
Speech recognition interfaces in native language will enable the illiterate/semi-literate people to use the technology to greater extent without the knowledge of operating with computer keyboard or stylus  ...  Speech recognition applications enable people to use speech as another input mode to interact with applications with ease and effectively.  ...  The output of DCT is Mel-cepstral coefficients of 13th order.  Delta MFCC Features -In order to capture the changes in speech from frameto-frame, the first and second derivative of the MFCC coefficients  ... 
doi:10.14257/ijsip.2016.9.4.34 fatcat:xbagvt7qc5a2dbxbwcjsofp7y4

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

Asroni Asroni, Ku Ruhana Ku-Mahamud, Cahya Damarjati, Hasan Basri Slamat
2021 Baghdad Science Journal  
The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients.  ...  Deep learning convolution neural network has been widely used to recognize or classify voice.  ...  Acknowledgment: Funding for this research was from the Universitas Muhammadiyah Yogyakarta, Indonesia and work was conducted in collaboration with the Universiti Utara Malaysia, Malaysia.  ... 
doi:10.21123/bsj.2021.18.2(suppl.).0925 fatcat:jlona462yvbbpinkdjpci7mj5m
« Previous Showing results 1 — 15 out of 398 results