A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Automatic Emotion Recognition in Speech: Possibilities and Significance Milana Bojanić and Vlado Delić T II. ... Test system must have the capability to drive machine in investigated operating mode (burden state) while a torque transducer, for direct measurements is sophisticated, delicate and expansive device. ...doaj:0b476e5f85f64c4b95b4d7fa64217045 fatcat:h234xfmaefddtbs7lsl7hsw4la
Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computationaldoi:10.1155/2019/4368036 pmid:31341467 pmcid:PMC6614991 fatcat:yfwrwisz7jgrtlijfpj7rkuuoi
more »... ntelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.
Studies have shown that people already perceive the interaction with computers, robots and media in the same way as they perceive social communication with other people. For that reason it is critical for a high-quality text-to-speech system (TTS) to sound as human-like as possible. However, a major obstacle in creating expressive TTS voices is that the amount of style-specific speech needed for training such a system is often not sufficient. This paper presents a comparison between differentdoi:10.1051/matecconf/201816103005 fatcat:aqvjypm52fbidmjkfl4lqpbwxu
more »... proaches to multi-style TTS, with focus on cases when only a small dataset per style is available. The described approaches have been originally proposed for efficient modelling of multiple speakers with a limited amount of data per speaker. Among the suggested approaches the approach based on style codes has emerged as the best, regardless of the target speech style. MATEC Web of Conferences 161, 03005 (2018)
Zoran Perić Vlado Delić Zoran Stamenković David Pokrajac 2 Computational Intelligence and Neuroscience ... Delić et al. provides an overview of speech technologies development as a typical signal processing area. e authors provide an analysis of the nature of speech signal and processing, corresponding machine ...doi:10.1155/2019/5428615 pmid:31781180 pmcid:PMC6875201 fatcat:v6yhe2452nanfmgf5zx2ees76m
Call center operators communicate with callers in different emotional states (anger, anxiety, fear, stress, joy, etc.). Sometimes a number of calls coming in a short period of time have to be answered and processed. In the moments when all call center operators are busy, the system puts that call on hold, regardless of its urgency. This research aims to improve the functionality of call centers by recognition of call urgency and redistribution of calls in a queue. It could be beneficial fordoi:10.3390/app10134653 fatcat:y2mjhoptbrbyvktz35jfhg6riu
more »... centers giving health care support for elderly people and emergency call centers. The proposed recognition of call urgency and consequent call ranking and redistribution is based on emotion recognition in speech, giving greater priority to calls featuring emotions such as fear, anger and sadness, and less priority to calls featuring neutral speech and happiness. Experimental results, obtained in a simulated call center, show a significant reduction in waiting time for calls estimated as more urgent, especially the calls featuring the emotions of fear and anger.
Advances in Speech Recognition
For example, Delić & Vujnović Sedlar (2010) have created the first audio game for the visually impaired with ASR and TTS in Serbian. ... the previous sections, the AlfaNum TTS engine, coupled with the AlfaNum ASR engine, was also used to create new computer games designed for entertainment and education of visually impaired children (Delić ... Delic (2010) . ...doi:10.5772/10113 fatcat:qxghdlb6rza3lhbbf7rok4rqti
Advances in Speech Recognition
Since 1998 a speech corpus has been developed for Serbian according to the SpeechDat(E) standard (Delić, 2000) . ... How to reference In order to correctly reference this scholarly work, feel free to copy and paste the following: Vlado Delic, Milan Secujski, Niksa Jakovljevic, Marko Janev, Radovan Obradovic and Darko ...doi:10.5772/10115 fatcat:hq5v2beyezgulj7zczeccobhxm
Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number ofdoi:10.3390/e24030414 pmid:35327924 pmcid:PMC8947568 fatcat:zladxwalvbagflpptyogqjf22y
more »... arameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.
Delić Tijana Vlado -researcher of Laboratory of Acoustics and Speech Technology of Faculty of Technical Sciences, University of Novi Sad. ...doi:10.15622/sp.60.8 fatcat:pt7qgzj2z5fv3aeloxrfgmxny4
This paper considers the research question of developing user-aware and adaptive conversational agents. The conversational agent is a system which is useraware to the extent that it recognizes the user identity and his/her emotional states that are relevant in a given interaction domain. The conversational agent is user-adaptive to the extent that it dynamically adapts its dialogue behavior according to the user and his/her emotional state. The paper summarizes some aspects of our previous workdoi:10.2298/fuee1403375d fatcat:lpt5vokizrauhnunbb44degx6a
more »... and presents work-in-progress in the field of speech-based human-machine interaction. It focuses particularly on the development of speech recognition modules in cooperation with both modules for emotion recognition and speaker recognition, as well as the dialogue management module. Finally, it proposes an architecture of a conversational agent that integrates those modules and improves each of them based on some kind of synergies among themselves.
The article addresses the influence of two aspects on speech emotion recognition utilization for an emergency call center: a frequency of a caller experiencing certain emotional state and classification methods used for speech emotion recognition. In situations when more simultaneous calls in an emergency call center are received, the aim is to detect more urgent callers, e.g. in a life threating situation, and give them priority in a callers' queue. Three different emotion distributions baseddoi:10.5937/telfor2102075b fatcat:qprwvefuo5d4dluhsc75wcwata
more »... n the corpora from real-world emergency call centers are considered. The influence of those emotion distributions on the proposed call redistribution and subsequent time savings are reported and discussed. Regarding speech emotion classification, two approaches are presented, namely the linear Bayes classifier and a multilayer perceptron-based neural network. Their recognition results on the corpus of acted emotional Serbian speech are presented and potential application in an emergency call center is discussed.
Vlado Delić is holding the associate professor position at the Faculty of Technical Sciences, Novi Sad, Serbia. ...doi:10.2298/csis090710007b fatcat:bwzpof3pibfhzjtkrjpj7pqvhq
10th Symposium on Neural Network Applications in Electrical Engineering
In this paper an efficient heart beat classification algorithm for mobile devices is presented. A simplified ECG model is used for feature extraction in the time domain. QRS complex is modeled by two straight lines while P and T waves are modeled by parabolas. The T wave asymmetry is achieved using a fourth degree parabola, whereas the P wave is modeled by the second degree parabola. The model parameters are estimated using the linear least squares fitting technique. Heart beats are classifieddoi:10.1109/neurel.2010.5644105 fatcat:d3cqi75odrc6fpweozcnmsyhhq
more »... sing the following classes: Normal, Supraventricular and Ventricular ectopic beats. Classification of model parameters is done using a feedforward neural network. The inputs used by the classifier are the following: QRS slopes, duration, P wave coefficients, adjacent and averaged RR intervals. Patient specific adaptation is achieved using a dominant heart beat as an additional classifier input. A series of tests have been performed to evaluate the classification algorithm. Three model sets were used for that purpose. The first one contains QRS parameters only. The second one contains the dominant QRS model as well and in the third model set the P wave and appropriate dominant P wave model are included. Training and testing is done using the MIT BIH arrhythmia database ECG signals subset and expressed in sensitivity (Se), specificity (Sp) and accuracy (Acc). It can be concluded that the best results are achieved when applying the classification algorithm on the third model set. The following results were obtained: SeN = 99.15% (sensitivity for normal heart beat); SpN = 97.5%; AccN = 98.65%; SeV = 94.69% (ventricular heart beat), SpV = 95.66%; AccV = 95.31%, SeS = 928%; SpS = 96.41%; AccS = 94.48%.
HMM-based Whisper Recognition using μ-law Frequency Warping. Abstract. Due to the lack of sufficient amount of whisper data for training, whispered speech recognition is a serious challenge for state-of-the-art Automatic Speech Recognition (ASR) systems. Because of great acoustic mismatch between neutral and whispered speech, ASR systems are faced with significant drop of performance when applied to whisper. In this paper, we give an analysis of neutral and whispered speech recognition based ondoi:10.15622/sp.58.2 fatcat:core3jmqonbavkk5npdqw6lulm
more »... traditional Hidden Markov Models (HMM) framework, in a Speaker Dependent (SD) and Speaker Independent (SI) cases. Special attention is paid to the neutral-trained recognition of whispered speech (N/W scenario). The ASR system is developed for recognition of isolated words from a real database (Whi-Spe) of neutral-whisper speech pairs. In the N/W scenario, a meaningful gain in robustness is achieved with the proposed frequency warping, originally developed for speech signal compression and expanding in digital telecommunication systems. Simultaneously, good performances in recognition of neutral speech are retained. Compared to baseline recognition with Mel-frequency Cepstral Coefficients (MFCC), word recognition accuracy with cepstral coefficients using proposed frequency warping (denoted as μFCC) is improved for 7.36% (SD) and 3.44% (SI), absolute. As well, the F-measure (harmonic mean of the precission and recall) for μFCC feature vectors is increased for 6.90% (SD) and 3.59 (SI). Statistical tests confirm significance of the achieved improvement in recognition accuracy.
« Previous Showing results 1 — 15 out of 140 results