140 Hits in 0.88 sec

Automatic Emotion Recognition in Speech: Possibilities and Significance

Milana Bojanić, Vlado Delić
2009 Electronics  
Automatic Emotion Recognition in Speech: Possibilities and Significance Milana Bojanić and Vlado Delić T II.  ...  Test system must have the capability to drive machine in investigated operating mode (burden state) while a torque transducer, for direct measurements is sophisticated, delicate and expansive device.  ... 
doaj:0b476e5f85f64c4b95b4d7fa64217045 fatcat:h234xfmaefddtbs7lsl7hsw4la

Speech Technology Progress Based on New Machine Learning Paradigm

Vlado Delić, Zoran Perić, Milan Sečujski, Nikša Jakovljević, Jelena Nikolić, Dragiša Mišković, Nikola Simić, Siniša Suzić, Tijana Delić
2019 Computational Intelligence and Neuroscience  
Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational
more » ... ntelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.
doi:10.1155/2019/4368036 pmid:31341467 pmcid:PMC6614991 fatcat:yfwrwisz7jgrtlijfpj7rkuuoi

A comparison of multi-style DNN-based TTS approaches using small datasets

Siniša Suzić, Tijana Delić, Vladimir Jovanović, Milan Sečujski, Darko Pekar, Vlado Delić, A. Ronzhin, V. Shishlakov
2018 MATEC Web of Conferences  
Studies have shown that people already perceive the interaction with computers, robots and media in the same way as they perceive social communication with other people. For that reason it is critical for a high-quality text-to-speech system (TTS) to sound as human-like as possible. However, a major obstacle in creating expressive TTS voices is that the amount of style-specific speech needed for training such a system is often not sufficient. This paper presents a comparison between different
more » ... proaches to multi-style TTS, with focus on cases when only a small dataset per style is available. The described approaches have been originally proposed for efficient modelling of multiple speakers with a limited amount of data per speaker. Among the suggested approaches the approach based on style codes has emerged as the best, regardless of the target speech style. MATEC Web of Conferences 161, 03005 (2018)
doi:10.1051/matecconf/201816103005 fatcat:aqvjypm52fbidmjkfl4lqpbwxu

Advanced Signal Processing and Adaptive Learning Methods

Zoran Perić, Vlado Delić, Zoran Stamenković, David Pokrajac
2019 Computational Intelligence and Neuroscience  
Zoran Perić Vlado Delić Zoran Stamenković David Pokrajac 2 Computational Intelligence and Neuroscience  ...  Delić et al. provides an overview of speech technologies development as a typical signal processing area. e authors provide an analysis of the nature of speech signal and processing, corresponding machine  ... 
doi:10.1155/2019/5428615 pmid:31781180 pmcid:PMC6875201 fatcat:v6yhe2452nanfmgf5zx2ees76m

Call Redistribution for a Call Center Based on Speech Emotion Recognition

Milana Bojanić, Vlado Delić, Alexey Karpov
2020 Applied Sciences  
Call center operators communicate with callers in different emotional states (anger, anxiety, fear, stress, joy, etc.). Sometimes a number of calls coming in a short period of time have to be answered and processed. In the moments when all call center operators are busy, the system puts that call on hold, regardless of its urgency. This research aims to improve the functionality of call centers by recognition of call urgency and redistribution of calls in a queue. It could be beneficial for
more » ... centers giving health care support for elderly people and emergency call centers. The proposed recognition of call urgency and consequent call ranking and redistribution is based on emotion recognition in speech, giving greater priority to calls featuring emotions such as fear, anger and sadness, and less priority to calls featuring neutral speech and happiness. Experimental results, obtained in a simulated call center, show a significant reduction in waiting time for calls estimated as more urgent, especially the calls featuring the emotions of fear and anger.
doi:10.3390/app10134653 fatcat:y2mjhoptbrbyvktz35jfhg6riu

Applications of Speech Technologies in Western Balkan Countries [chapter]

Darko Pekar, Dragisa Miskovic, Dragan Knezevic, Natasa Vujnovic, Milan Secujski, Vlado Delic
2010 Advances in Speech Recognition  
For example, Delić & Vujnović Sedlar (2010) have created the first audio game for the visually impaired with ASR and TTS in Serbian.  ...  the previous sections, the AlfaNum TTS engine, coupled with the AlfaNum ASR engine, was also used to create new computer games designed for entertainment and education of visually impaired children (Delić  ...  Delic (2010) .  ... 
doi:10.5772/10113 fatcat:qxghdlb6rza3lhbbf7rok4rqti

Speech Technologies for Serbian and Kindred South Slavic Languages [chapter]

Vlado Delic, Milan Secujski, Niksa Jakovljevic, Marko Janev, Radovan Obradovic, Darko Pekar
2010 Advances in Speech Recognition  
Since 1998 a speech corpus has been developed for Serbian according to the SpeechDat(E) standard (Delić, 2000) .  ...  How to reference In order to correctly reference this scholarly work, feel free to copy and paste the following: Vlado Delic, Milan Secujski, Niksa Jakovljevic, Marko Janev, Radovan Obradovic and Darko  ... 
doi:10.5772/10115 fatcat:hq5v2beyezgulj7zczeccobhxm

Speech signal processing in ASR&TTS algorithms

Vlado Delic, Darko Pekar, Radovan Obradovic, Milan Secujski
2003 Facta universitatis - series Electronics and Energetics  
doi:10.2298/fuee0303355d fatcat:v5cv7lzzb5fo7nrtbcbdbzvfiy

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Nikola Simić, Siniša Suzić, Tijana Nosek, Mia Vujović, Zoran Perić, Milan Savić, Vlado Delić
2022 Entropy  
Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of
more » ... arameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.
doi:10.3390/e24030414 pmid:35327924 pmcid:PMC8947568 fatcat:zladxwalvbagflpptyogqjf22y

Style-Code Method for Multi-Style Parametric Text-to-Speech Synthesis

Siniša Suzić, Tijana Vlado Delić, Stevan Ostrogonac, Simona Đurić, Darko Jovan Pekar
2018 Труды СПИИРАН  
Delić Tijana Vlado -researcher of Laboratory of Acoustics and Speech Technology of Faculty of Technical Sciences, University of Novi Sad.  ... 
doi:10.15622/sp.60.8 fatcat:pt7qgzj2z5fv3aeloxrfgmxny4

User-awareness and adaptation in conversational agents

Vlado Delic, Milan Gnjatovic, Niksa Jakovljevic, Branislav Popovic, Ivan Jokic, Milana Bojanic
2014 Facta universitatis - series Electronics and Energetics  
This paper considers the research question of developing user-aware and adaptive conversational agents. The conversational agent is a system which is useraware to the extent that it recognizes the user identity and his/her emotional states that are relevant in a given interaction domain. The conversational agent is user-adaptive to the extent that it dynamically adapts its dialogue behavior according to the user and his/her emotional state. The paper summarizes some aspects of our previous work
more » ... and presents work-in-progress in the field of speech-based human-machine interaction. It focuses particularly on the development of speech recognition modules in cooperation with both modules for emotion recognition and speaker recognition, as well as the dialogue management module. Finally, it proposes an architecture of a conversational agent that integrates those modules and improves each of them based on some kind of synergies among themselves.
doi:10.2298/fuee1403375d fatcat:lpt5vokizrauhnunbb44degx6a

Influence of emotion distribution and classification on a call processing for an emergency call center

Milana Bojanić, Vlado Delić, Alexey Karpov
2021 Telfor Journal  
The article addresses the influence of two aspects on speech emotion recognition utilization for an emergency call center: a frequency of a caller experiencing certain emotional state and classification methods used for speech emotion recognition. In situations when more simultaneous calls in an emergency call center are received, the aim is to detect more urgent callers, e.g. in a life threating situation, and give them priority in a callers' queue. Three different emotion distributions based
more » ... n the corpora from real-world emergency call centers are considered. The influence of those emotion distributions on the proposed call redistribution and subsequent time savings are reported and discussed. Regarding speech emotion classification, two approaches are presented, namely the linear Bayes classifier and a multilayer perceptron-based neural network. Their recognition results on the corpus of acted emotional Serbian speech are presented and potential application in an emergency call center is discussed.
doi:10.5937/telfor2102075b fatcat:qprwvefuo5d4dluhsc75wcwata

QoS testing in a live private IP MPLS network with CoS implemented

Bojovic Zivko, Secerov Emil, Delic Vlado
2010 Computer Science and Information Systems  
Vlado Delić is holding the associate professor position at the Faculty of Technical Sciences, Novi Sad, Serbia.  ... 
doi:10.2298/csis090710007b fatcat:bwzpof3pibfhzjtkrjpj7pqvhq

An efficient ECG modeling for heartbeat classification

Stevan Jokic, Srdan Krco, Vlado Delic, Dejan Sakac, Ivan Jokic, Zoran Lukic
2010 10th Symposium on Neural Network Applications in Electrical Engineering  
In this paper an efficient heart beat classification algorithm for mobile devices is presented. A simplified ECG model is used for feature extraction in the time domain. QRS complex is modeled by two straight lines while P and T waves are modeled by parabolas. The T wave asymmetry is achieved using a fourth degree parabola, whereas the P wave is modeled by the second degree parabola. The model parameters are estimated using the linear least squares fitting technique. Heart beats are classified
more » ... sing the following classes: Normal, Supraventricular and Ventricular ectopic beats. Classification of model parameters is done using a feedforward neural network. The inputs used by the classifier are the following: QRS slopes, duration, P wave coefficients, adjacent and averaged RR intervals. Patient specific adaptation is achieved using a dominant heart beat as an additional classifier input. A series of tests have been performed to evaluate the classification algorithm. Three model sets were used for that purpose. The first one contains QRS parameters only. The second one contains the dominant QRS model as well and in the third model set the P wave and appropriate dominant P wave model are included. Training and testing is done using the MIT BIH arrhythmia database ECG signals subset and expressed in sensitivity (Se), specificity (Sp) and accuracy (Acc). It can be concluded that the best results are achieved when applying the classification algorithm on the third model set. The following results were obtained: SeN = 99.15% (sensitivity for normal heart beat); SpN = 97.5%; AccN = 98.65%; SeV = 94.69% (ventricular heart beat), SpV = 95.66%; AccV = 95.31%, SeS = 928%; SpS = 96.41%; AccS = 94.48%.
doi:10.1109/neurel.2010.5644105 fatcat:d3cqi75odrc6fpweozcnmsyhhq

HMM-based Whisper Recognition using μ-law Frequency Warping

Jovan Neđo Galić, Slobodan Toma Jovičić, Vlado Dragomir Delić, Branko Rade Marković, Dragana Staniša Šumarac Pavlović, Đorđe Tomislav Grozdić
2018 Труды СПИИРАН  
HMM-based Whisper Recognition using μ-law Frequency Warping. Abstract. Due to the lack of sufficient amount of whisper data for training, whispered speech recognition is a serious challenge for state-of-the-art Automatic Speech Recognition (ASR) systems. Because of great acoustic mismatch between neutral and whispered speech, ASR systems are faced with significant drop of performance when applied to whisper. In this paper, we give an analysis of neutral and whispered speech recognition based on
more » ... traditional Hidden Markov Models (HMM) framework, in a Speaker Dependent (SD) and Speaker Independent (SI) cases. Special attention is paid to the neutral-trained recognition of whispered speech (N/W scenario). The ASR system is developed for recognition of isolated words from a real database (Whi-Spe) of neutral-whisper speech pairs. In the N/W scenario, a meaningful gain in robustness is achieved with the proposed frequency warping, originally developed for speech signal compression and expanding in digital telecommunication systems. Simultaneously, good performances in recognition of neutral speech are retained. Compared to baseline recognition with Mel-frequency Cepstral Coefficients (MFCC), word recognition accuracy with cepstral coefficients using proposed frequency warping (denoted as μFCC) is improved for 7.36% (SD) and 3.44% (SI), absolute. As well, the F-measure (harmonic mean of the precission and recall) for μFCC feature vectors is increased for 6.90% (SD) and 3.59 (SI). Statistical tests confirm significance of the achieved improvement in recognition accuracy.
doi:10.15622/sp.58.2 fatcat:core3jmqonbavkk5npdqw6lulm
« Previous Showing results 1 — 15 out of 140 results