Filters








39,995 Hits in 7.0 sec

A "GAP-model" based framework for online VVoIP QoE measurement

Prasad Calyam, Eylem Ekici, Chang-Gun Lee, Mark Haffner, Nathan Howes
2007 Journal of Communications and Networks  
For pro-active troubleshooting of VVoIP performance bottlenecks that manifest to end-users as performance impairments such as video frame freezing and voice dropouts, network operators cannot rely on actual  ...  The framework features the "GAP-Model", which is an offline model of QoE expressed as a function of measurable network factors such as bandwidth, delay, jitter, and loss.  ...  levels of b net , d net , l net and j net , each within a GAP performance level.  ... 
doi:10.1109/jcn.2007.6182880 fatcat:63xcvk3rjfbble5rcnd4bcbppu

Improving Singing Voice Separation Using Curriculum Learning on Recurrent Neural Networks

Seungtae Kang, Jeong-Sik Park, Gil-Jin Jang
2020 Applied Sciences  
In this study, we regard the data providing obviously dominant characteristics of a single source as an easy case and the other data as a difficult case.  ...  We propose a new singing voice separation approach based on the curriculum learning framework, in which learning is started with only easy examples and then task difficulty is gradually increased.  ...  The most significant increment was observed in GSIR m in RNN, 1.69 dB, but there were decrement in GSAR m and GSIR v of U-Net. The overall performance measured by GNSDR values were all increased.  ... 
doi:10.3390/app10072465 fatcat:r5z5qzalpvgyzk4lfngzb6oh4u

RPCA-DRNN technique for monaural singing voice separation

Wen-Hsing Lai, Siou-Lin Wang
2022 EURASIP Journal on Audio, Speech, and Music Processing  
The experimental results of MIR-1K, ccMixter, and MUSDB18 datasets and the comparison with ten existing techniques indicate that the proposed method achieves competitive performance in monaural singing  ...  On MUSDB18, the proposed method reaches the comparable separation quality in less training data and lower computational cost compared to the other state-of-the-art technique.  ...  t is the sum of Ṽ St and Ṽ Lt , as expressed in (11) , and it is compared with the original clean singing voice.  ... 
doi:10.1186/s13636-022-00236-9 fatcat:pywuyt2sevflzluwof2va3pdri

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning

Dong Liu, Zhiyong Wang, Lifeng Wang, Longxi Chen
2021 Frontiers in Neurorobotics  
Among them, the voice uses the convolutional neural network-long and short term memory (CNN-LSTM) network, and the facial expression in the video uses the Inception-Res Net-v2 network to extract the feature  ...  The redundant information, noise data generated in the process of single-modal feature extraction, and traditional learning algorithms are difficult to obtain ideal recognition performance.  ...  Convolutional neural network-long and short term memory and Inception-Res Net-v2 are used to extract the feature data of facial expressions in voice and video, respectively.  ... 
doi:10.3389/fnbot.2021.697634 fatcat:zdxtgfxdffdgbboqupkmas2edy

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis [article]

Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao
2022 arXiv   pre-print
., speaker identity, emotion, and prosody) derived from an acoustic reference, while facing the following challenges: 1) The highly dynamic style features in expressive voice are difficult to model and  ...  The extension studies to adaptive style transfer further show that GenerSpeech performs robustly in the few-shot data setting. Audio samples are available at  ...  In the multi-task finetuning, the training loss is obtained by the weighted sum of the losses of these two tasks: L mul = ρL s + (1 − ρ)L e , (4 ) where L s and L e denote the cross entropy loss in the  ... 
arXiv:2205.07211v1 fatcat:xbd5tfkxdnfnbhg2zoj6w63ycq

Semantics-Consistent Representation Learning for Remote Sensing Image-Voice Retrieval [article]

Hailong Ning, Bin Zhao, Yuan Yuan
2021 arXiv   pre-print
This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data.  ...  consistency among non-paired representations plays an important role in the RS image-voice retrieval task.  ...  Here, the MFCC feature is expressed as X m = Υ(X V m ). The obtained MFCC feature is reshaped, and then fed into the voice encoding network to perform high-level voice features.  ... 
arXiv:2103.05302v1 fatcat:qjwrun343jbmbbheur2oolwmke

Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet

Axel Roebel, Frederik Bous
2022 Information  
Evaluations are performed using objective measures and a number of perceptual tests including different neural vocoder algorithms known from the literature.  ...  The results confirm that the proposed vocoder compares favorably to the state-of-the-art in its capacity to generalize to unseen voices and voice qualities. The remaining challenges will be discussed.  ...  Acknowledgments: The authors would like to thank Won Jang for sharing information and materials related to the universal MelGAN [23] .  ... 
doi:10.3390/info13030103 fatcat:5d6otkjobfgmldnjors5xapf3m

Measuring Interaction QoE in Internet Videoconferencing [chapter]

Prasad Calyam, Mark Haffner, Eylem Ekici, Chang-Gun Lee
2007 Lecture Notes in Computer Science  
Hence, it is important to measure and subsequently minimize the extra end-user interaction effort in a videoconferencing system.  ...  This is because videoconference end-users frequently experience perceptual quality impairments such as video frame freezing and voice dropouts due to changes in network conditions on the Internet.  ...  The studies in [5] and [6] provide the performance levels for j net and l net on the basis of empirical experiments on the Internet.  ... 
doi:10.1007/978-3-540-75869-3_2 fatcat:icatrpztt5gbhgsninnhuwulue

Data-driven Pitch Content Description of Choral Singing Recordings

Helena Cuesta, Emilia Gómez
2022 Zenodo  
Finally, we propose two methods to characterize vocal unison performances in terms of pitch dispersion.  ...  Then, we address three main research problems: multiple F0 estimation and streaming, voice assignment, and the characterization of vocal unisons, all in the context of four-part vocal ensembles.  ...  We hope that our humble contributions in terms of data and algorithms help push the state-of-the-art forward and put MIR technologies at the service of choral music.  ... 
doi:10.5281/zenodo.6389643 fatcat:zibszrdivjhcnap2gzll3sbxga

Modeling the Pathophysiology of Phonotraumatic Vocal Hyperfunction With a Triangular Glottal Model of the Vocal Folds

Gabriel E. Galindo, Sean D. Peterson, Byron D. Erath, Christian Castro, Robert E. Hillman, Matías Zañartu
2017 Journal of Speech, Language and Hearing Research  
the voice acoustic signal due to incomplete glottal closure, but this also leads to high vocal-fold collision forces (reflected in aerodynamic measures), which significantly increases the risk of developing  ...  model of voice production.  ...  In addition, the net energy transferred (NET) to the vocal folds are measured, as defined by Thomson, Mongeau, and Frankel (2005) , and the maximum contact pressure of the vocal folds (MCP), which is  ... 
doi:10.1044/2017_jslhr-s-16-0412 pmid:28837719 pmcid:PMC5831616 fatcat:2t5buhgvlzef7hqsxbuifkvkvi

Voice over IP performance monitoring

R. G. Cole, J. H. Rosenbluth
2001 Computer communication review  
We find that an in-path monitor requires the definition of a reference de-jitter buffer implementation to estimate voice quality based upon observed transport measurements.  ...  We describe a method for monitoring Voice over IP (VolP) applications based upon a reduction of the ITU-T's E-Model to transport level, measurable quantities.  ...  INTRODUCTION There is great interest in supporting voice applications over both the public Internet and private intra-nets, i.e., Voice over IP (VoIP).  ... 
doi:10.1145/505666.505669 fatcat:q3svx6xumfhdlh2ad5fnw2hrea

An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity

Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Quy Hy Nguyen, Minghui Dong, Haizhou LI
2016 9th ISCA Speech Synthesis Workshop  
We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).  ...  This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity.  ...  As the speech data in Voice Conversion Challenge 2016 is lively and expressive, this two-stage alignment is applicable and bring more accurate alignment over the typical alignment by dynamic time warping  ... 
doi:10.21437/ssw.2016-8 dblp:conf/ssw/HuangXLWMTZDLHD16 fatcat:konaiiubtrhvxpplxy2j3vnv5u

Assessing call centers' success: A validation of the DeLone and Mclean model for information system

Hesham A. Baraka, Hoda A. Baraka, Islam H. EL-Gamily
2013 Egyptian Informatics Journal  
Multiple linear regression analysis was used in order to provide a linear formula for the User Satisfaction dimension and the Net Benefits dimension in order to be able to forecast the values for these  ...  The analysis of the different weights cases gave priority to the User satisfaction and net Benefits dimension as the two outcomes from the system.  ...  Categories were identified based on the following parameters: Size of the call center expressed by the number of agents. Channels of Communication (voice vs. voice and data).  ... 
doi:10.1016/j.eij.2013.03.001 fatcat:cqqpdmgwpfaapbq2mft4sgmfzy

Blind Speech Signal Quality Estimation for Speaker Verification Systems

Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov
2020 Interspeech 2020  
Additionally, current research revealed the need for an accurate voice activity detector that performs well in both clean and noisy unseen environments.  ...  The problem of system performance degradation in mismatched acoustic conditions has been widely acknowledged in the community and is common for different fields.  ...  These are neural network based approaches trained to predict the MOS, PESQ and similar measures on some sets of training data [12, 13] .  ... 
doi:10.21437/interspeech.2020-1826 dblp:conf/interspeech/LavrentyevaVANG20 fatcat:a6am67f56ncm5djhcpbfwmkk6m

Utility-based joint power and admission control algorithm in cognitive wireless networks

Roshanak Nasiri Shafti, Abdorasoul Ghasemi
2012 20th Iranian Conference on Electrical Engineering (ICEE2012)  
We consider a data communication scenario in which utility obtained by a user measures its quality of service (QoS).  ...  Simulation results show that the proposed algorithm performs efficiently in maximizing total utility of cognitive data networks considering primary users protection compared to the similar previous algorithm  ...  Particularly, in the voice networks a user is satisfied when it gets a specific SINR, whereas in data networks there is not a specific SINR satisfaction level for a user.  ... 
doi:10.1109/iraniancee.2012.6292530 fatcat:6wp74lvl6zbhppw7a4dvy7nqhq
« Previous Showing results 1 — 15 out of 39,995 results