Filters








1,286 Hits in 14.7 sec

Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech [article]

Michael Neumann, Ngoc Thang Vu
2017 arXiv   pre-print
We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted).  ...  Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system.  ...  In this paper, we propose an attentive convolutional neural network (ACNN) for emotion recognition which combines the strengths of CNNs and attention mechanisms.  ... 
arXiv:1706.00612v1 fatcat:kjzbup7kurfinmr4vz4eswbece

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Michael Neumann, Ngoc Thang Vu
2017 Interspeech 2017   unpublished
We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted).  ...  Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system.  ...  In this paper, we propose an attentive convolutional neural network (ACNN) for emotion recognition which combines the strengths of CNNs and attention mechanisms.  ... 
doi:10.21437/interspeech.2017-917 fatcat:vvaoy3lyy5bphjmlhjbaegch4q

The Impact of Attention Mechanisms on Speech Emotion Recognition

Shouyan Chen, Mingyan Zhang, Xiaofen Yang, Zhijia Zhao, Tao Zou, Xinqi Sun
2021 Sensors  
Speech emotion recognition (SER) plays an important role in real-time applications of human-machine interaction. The Attention Mechanism is widely used to improve the performance of SER.  ...  With this knowledge, a classifier (CNN-LSTM×2+Global-Attention model) for SER is proposed. The experiments result show that it could achieve an accuracy of 85.427% on the EMO-DB dataset.  ...  Related Work Traditional methods of Speech Emotion Recognition (SER) are mainly based on basic acoustic emotion features and machine learning models.  ... 
doi:10.3390/s21227530 pmid:34833603 pmcid:PMC8622179 fatcat:r6njkft5hrdn3evu5gev462vzm

Emotion Recognition from Speech [article]

Kannan Venkataramanan, Haresh Rengaraj Rajamohan
2019 arXiv   pre-print
The significance of these features for emotion classification was compared by applying methods such as Long Short Term Memory (LSTM), Convolutional Neural Networks (CNNs), Hidden Markov Models (HMMs) and  ...  In this work, we conduct an extensive comparison of various approaches to speech based emotion recognition systems.  ...  Models Convolutional Neural Networks The tremendous strides made in the recent years in image recognition tasks is in large part due to the advent of Convolutional Neural Networks [12] (CNNs).  ... 
arXiv:1912.10458v1 fatcat:jxl2uwpebfeatdqaao4duyqp6i

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Babak Joze Abbaschian, Daniel Sierra-Sosa, Adel Elmaghraby
2021 Sensors  
The goal of this study is to provide a survey of the field of discrete speech emotion recognition.  ...  The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/s21041249 pmid:33578714 pmcid:PMC7916477 fatcat:nj5ihjhvnfcxtk7hu3n4zx4bka

A Comprehensive Review of Speech Emotion Recognition Systems

Taiba Majid Wani, Teddy Surya Gunawan, Syed Asif Ahmad Qadri, Mira Kartiwi, Eliathamby Ambikairajah
2021 IEEE Access  
Attention-based Convolutional Neural Network (ACNN) is used as the baseline structure and trained on USE-IEMOCAP and tested on MSP-IMPROV and ACNN-AE on Tedium.  ...  As we know, speech is practically a continuous signal but of varying length carrying both information and emotion.  ... 
doi:10.1109/access.2021.3068045 fatcat:otlyazg5mzg3rpjqv56jecmjfq

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

Eva Lieskovská, Maroš Jakubec, Roman Jarina, Michal Chmulík
2021 Electronics  
This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance.  ...  Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role  ...  Impact of Attention Mechanism on SER We performed a comparison of related works based on the most common settings to study the impact of AM on speech emotion recognition.  ... 
doi:10.3390/electronics10101163 fatcat:7nlbonwh4jcqrhn2ogcxyvv4zq

Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, Björn Schuller
2016 Interspeech 2016  
This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech.  ...  We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions.  ...  Innovative Action No. 644632 (MixedEmotions), No. 645094 (SEWA), and the Research Innovative Action No. 645378 (ARIA-VALUSPA), and by the German Federal Ministry of Education, Science, Research and Technology  ... 
doi:10.21437/interspeech.2016-998 dblp:conf/interspeech/ZhangRHDMS16 fatcat:ovc6i2und5bvloce73obwoqoru

Multimodal Emotion Recognition from Art Using Sequential Co-Attention

Tsegaye Misikir Tashu, Sakina Hajiyeva, Tomas Horvath
2021 Journal of Imaging  
In this study, we present a multimodal emotion recognition architecture that uses both feature-level attention (sequential co-attention) and modality attention (weighted modality fusion) to classify emotion  ...  Experimental results on the WikiArt emotion dataset showed the efficiency of the approach proposed and the usefulness of three modalities in emotion recognition.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/jimaging7080157 pmid:34460793 pmcid:PMC8404915 fatcat:xktlzcr2zbc4rdb6tlk6uhh46i

Frustration recognition from speech during game interaction using wide residual networks

Meishu Song, Adria Mallol-Ragolta, Emilia Parada-Cabaleiro, Zijiang Yang, Shuo Liu, Zhao Ren, Ziping Zhao, Björn W. Schuller
2021 Virtual Reality & Intelligent Hardware  
Because of the continual improvements in speech recognition tasks achieved by the use of convolutional neural networks (CNNs), unlike the MGFD baseline, which is based on the Long Short-Term Memory (LSTM  ...  We explored the performance of a variety of acoustic feature sets, including Mel-Spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), and the low-dimensional knowledge-based acoustic feature set  ...  Indeed, convolutional neural networks have been widely scaled up to improve network accuracy, one simple composite scaling method is based on a fixed set of scaling coefficients, thus uniformly scaling  ... 
doi:10.1016/j.vrih.2020.10.004 fatcat:xxpcdgnapvgohj5gbtvg3ztuxa

Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm

Abdelaziz A. Abdelhamid, El-Sayed M. El-Kenawy, Bandar Alotaibi, Ghada M. Amer, Mahmoud Y. Abdelkader, Abdelhameed Ibrahim, Marwa M. Eid
2022 IEEE Access  
This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations  ...  One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly.  ...  The recognition of speech emotions usually includes extracting paralinguistic features from speech. These features should be independent of the speaker and lexical content of the speech signal.  ... 
doi:10.1109/access.2022.3172954 fatcat:l65rupw445awpei7mhwge6cbq4

Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Sanghyun Lee, David K. Han, Hanseok Ko
2020 Sensors  
Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels.  ...  Speech emotion recognition predicts the emotional state of a speaker based on the person's speech. It brings an additional element for creating more natural human–computer interactions.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/s20226688 pmid:33238396 fatcat:3xbwvohmmred7nccriprq2i3ea

Autoencoder with Emotion Embedding for Speech Emotion Recognition

Chenghao Zhang, Lei Xue
2021 IEEE Access  
A. INPUT SPEECH FEATURE 1) LOG MAGNITUDE SPECTROGRAM A spectrogram is a useful expression for the analysis of speech and audio signals.  ...  Recently, with the increased interest in deep learning (DL) algorithms, the automatic extraction of useful features from speech signals by deep neural networks (DNNs), such as recurrent neural networks  ... 
doi:10.1109/access.2021.3069818 fatcat:tw37syjerndi5gtdw2la7ffb4m

Emotion Recognition from Skeletal Movements

Tomasz Sapiński, Dorota Kamińska, Adam Pelikant, Gholamreza Anbarjafari
2019 Entropy  
Most research in the area of automated emotion recognition is based on facial expressions or speech signals.  ...  The proposed algorithm creates a sequential model of affective movement based on low level features inferred from the spacial location and the orientation of joints within the tracked skeleton.  ...  Acknowledgments: The Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund and the Scientific and Technological Research Council of Turkey (TÜBITAK) (Project 1001  ... 
doi:10.3390/e21070646 pmid:33267360 fatcat:qubacgnu4vbkhkgggbzfnqxowm

Arabic Speech Emotion Recognition from Saudi Dialect Corpus

Reem H. Aljuhani, Areej Alshutayri, Shahd Alahdal
2021 IEEE Access  
The first model combined a convolutional neural networks (CNN), bi-directional long short-term memory (BLSTM), and deep neural networks (DNN) for the attention-based CNN-LSTM-DNN model, and the second  ...  MLP is a network made up of perceptron with input and output layers. Input layers take input signals, and output layers make predictions for the given output [13] .  ... 
doi:10.1109/access.2021.3110992 fatcat:c73knoukoradles6fmny6sgffq
« Previous Showing results 1 — 15 out of 1,286 results