143 Hits in 5.9 sec

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate [article]

Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, Nicola Pia
2021 arXiv   pre-print
However, autoregressive vocoders are still the common choice for neural generation of speech signals coded at very low bit rates.  ...  In this paper, we present a GAN vocoder which is able to generate wideband speech waveforms from parameters coded at 1.6 kbit/s.  ...  However, they are by design not suited for streaming or real-time speech communication, since they take the advantage of heavy parallelization for processing large blocks of conditioning information at  ... 
arXiv:2108.04051v1 fatcat:5uq6euicsfh2fcuqhtqmteltte

Revisiting VoIP QoE assessment methods: are they suitable for VoLTE?

Ramon Sanchez-Iborra, Maria-Dolores Cano, Joan Garcia-Haro
2016 Network Protocols and Algorithms  
At the same time, Quality of user Experience (QoE) assessment methods have notably evolved in the last years, especially for VoIP services.  ...  In this paper, we contribute to answer this question by: (i) providing a high level exploration of current objective non-intrusive models for VoIP QoE evaluation, (ii) highlighting the review of works  ...  Acknowledgement This work was supported by the MINECO/FEDER project grant TEC2013-47016-C2-2-R (COINS).  ... 
doi:10.5296/npa.v8i2.9123 fatcat:rtfm74jzn5hgrdngecqxpkakji

End-to-End Neural Speech Coding for Real-Time Communications [article]

Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu
2022 arXiv   pre-print
Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC).  ...  This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding.  ...  JOINT OPTIMIZATION WITH SPEECH ENHANCEMENT AND PACKET LOSS CONCEALMENT In real-time communications, there are several types of degradations besides quality loss by audio coding, such as background noises  ... 
arXiv:2201.09429v3 fatcat:baxdi3zvyzftbjcnxuidfykmqq

Modification of codebook search in adaptive multi-rate wideband speech codecs using intelligent optimization algorithms

Mansour Sheikhan
2012 Neural computing & applications (Print)  
In recognition of high-quality wideband speech codecs, several standardization activities have been conducted, resulting in the selection of a wideband speech codec called adaptive multi-rate wideband  ...  evaluation of speech quality, when used in an AMR-WB speech codec.  ...  In another work, the author has used a fuzzy ARTMAP neural network (FAMNN) to determine the best index of shape codebook in ITU-T G.728 speech coding algorithm [31] .  ... 
doi:10.1007/s00521-012-1321-7 fatcat:4ybyitduyfbkpmwogvbpdlwcpa

Multimode Tree-Coding of Speech with Pre-/Post-Weighting

Ying-Yi Li, Pravin Ramadas, Jerry Gibson
2022 Applied Sciences  
We present a low-complexity, low-delay speech codec based on tree-coding with sample-by-sample adaptive long- and short-code generators that incorporates pre- and post-filtering for perceptual weighting  ...  The coding of the multiple speech modes and comfort noise generation is accomplished using the code generator adaptation algorithms, again, rather than using the input speech.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app12042026 fatcat:ar3ie5prlvctfcpxqzcvppjzfm

Advances in voice quality measurement in modern telecommunications

Abdulhussain E. Mahdi, Dorel Picovici
2009 Digital signal processing (Print)  
The perceived quality of the coded speech will, therefore, be independent of the type of coding and transmission, when estimated by a distance measure between perceptually transformed speech signals.  ...  Intrusive speech quality measures are more accurate, but normally are unsuitable for monitoring real-time traffic in live networks.  ... 
doi:10.1016/j.dsp.2007.11.006 fatcat:ogoe65hofzdd7mjrouyvquk2vm

Adversarial Auto-Encoding for Packet Loss Concealment [article]

Santiago Pascual, Joan Serrà, Jordi Pons
2021 arXiv   pre-print
Communication technologies like voice over IP operate under constrained real-time conditions, with voice packets being subject to delays and losses from the network.  ...  Recently, autoregressive deep neural networks have been shown to surpass the quality of signal processing methods for PLC, specially for long-term predictions beyond 60 ms.  ...  Recurrent networks are known to be very competitive for sequential generative modeling, and specially for speech synthesis.  ... 
arXiv:2107.03100v2 fatcat:b5ny3ubjdzcbpk43zlpoidskhm

New single-ended objective measure for non-intrusive speech quality evaluation

Abdulhussain E. Mahdi, Dorel Picovici
2008 Signal, Image and Video Processing  
The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records.  ...  Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.  ...  Leigh Thorpe from Nortel Networks, Ottawa, Canada for providing the speech database used in this work, and to Plassey Campus Centre, University of Limerick for partly funding this project.  ... 
doi:10.1007/s11760-008-0092-1 fatcat:4tbjxdislbhe3e35q42qhhewaa

Robust ML wideband beamforming in reverberant fields

E.D. Di Claudio, R. Parisi
2003 IEEE Transactions on Signal Processing  
in the neuron space, which was originally developed for the training of multilayer neural networks.  ...  Index Terms-Acoustics, adaptive beamforming, cepstrum, maximum likelihood, minimum variance, multipath, neural network learning, reverberation, ridge regression, robust algorithms, sensor arrays, wideband  ...  Standard results from speech coding show that a good local signal-to-noise ratio (SNR) in the frequency domain, achieved by a combination of pre-emphasis, perceptual filtering, and cepstrum analysis, is  ... 
doi:10.1109/tsp.2002.806866 fatcat:fjnelybhavftxbzgx3fp2no3fu

Deep Learning: Methods and Applications

Li Deng
2014 Foundations and Trends® in Signal Processing  
, real-valued vectors via a neural network and then used for processing by subsequent neural network layers.  ...  As discussed in Section 3.2, the concept of convolution in time was originated in the TDNN (time-delay neural network) as a shallow neural network [202, 382] developed during early days of speech recognition  ... 
doi:10.1561/2000000039 fatcat:vucffxhse5gfhgvt5zphgshjy4

Quality Enhancement of Overdub Singing Voice Recordings

Benedikt Wimmer, Jordi Janer, Merlijn Blaauw
2021 Zenodo  
In this work, two neural network architectures for speech denoising – namely FullSubNet and Wave-U-Net – were trained and evalu-ated specifically on denoising of user singing voice recordings.  ...  Focusing on the aspect of removing degradation such as background noise or room reverberation, singing enhancement is related to the topic of speech enhancement.  ...  Since the year of 2013 [3] , neural networks have taken over the field of speech enhancement, especially denoising.  ... 
doi:10.5281/zenodo.5553906 fatcat:elv437mgfvc63j6ktgxgdgiahq

Perceptual coding of digital audio

T. Painter, A. Spanias
2000 Proceedings of the IEEE  
central importance in perceptual audio coding.  ...  In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio.  ...  bank viable for real-time applications.  ... 
doi:10.1109/5.842996 fatcat:jkfvoxg7zrcyxg6fyahe4u73pu

Differential brain-to-brain entrainment while speaking and listening in native and foreign languages

Alejandro Pérez, Guillaume Dumas, Melek Karadag, Jon Andoni Duñabeitia
2019 Cortex  
These results indicate that between brain similarities in the timing of neural activations and their spatial distributions change depending on the language code used.  ...  We argue that factors like linguistic alignment, joint attention and brain-entrainment to speech operate with a language-idiosyncratic neural configuration, modulating the alignment of neural activity  ...  The authors acknowledge financial support from the Spanish Ministry of Economy and Competitiveness through the "Severo Ochoa Programme for Centres/Units of Excellence in R&D" (SEV-2015-490) and the PSI2015  ... 
doi:10.1016/j.cortex.2018.11.026 pmid:30598230 fatcat:izfnrl3szzdv7jzk54ftvc2k54

A Methodology for Deriving VoIP Equipment Impairment Factors for a Mixed NB/WB Context

A. Raja, R. Azad, C. Flanagan, C. Ryan
2008 IEEE transactions on multimedia  
This paper proposes a novel approach to quantifying the quality degradation of Voice over IP (VoIP) telephony in the presence of codec and network-related impairments.  ...  The effectiveness of the approach is demonstrated by a number of generated models which compare favorably with WB-PESQ and outperform the traditional E-Model in terms of prediction accuracy when compared  ...  Wideband (WB) offers more natural sounding speech than narrowband (NB), and IP networks allow the transition to occur essentially by a simple change of codecs.  ... 
doi:10.1109/tmm.2008.2001359 fatcat:vhkzewdnh5f3jf7tj4r5yggjla

Design of large polyphase filters in the Quadratic Residue Number System

Gian Carlo Cardarilli, Alberto Nannarelli, Yann Oster, Massimo Petricca, Marco Re
2010 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers  
In competitive neural networks, for example, the simplest such locally linear algorithm, Winner-Take-All (WTA), updates only one locally linear model at a time during training.  ...  In real-life structures fatigue damage condition can be monitored in real-time by acquiring real-time signals from the sensors.  ... 
doi:10.1109/acssc.2010.5757589 fatcat:ccxnu5owr5fyrcjcqukumerueq
« Previous Showing results 1 — 15 out of 143 results