Filters








10,136 Hits in 12.0 sec

HASA-net: A non-intrusive hearing-aid speech assessment network [article]

Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao
2021 arXiv   pre-print
Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.  ...  To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.  ...  For speech quality, Quality-Net [11] was proposed as an end-to-end, non-intrusive speech quality evaluation model; Quality-Net is a BLSTM-based model and capable to predict perceptual evaluation of speech  ... 
arXiv:2111.05691v1 fatcat:igjek4ggqfcn5fm74dnznyiaau

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation [article]

Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen
2021 arXiv   pre-print
The ceaseless proposal of a large number of techniques to extract features and fuse multimodal information has highlighted the need for an overview that comprehensively describes and discusses audio-visual  ...  Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources  ...  Disentangled audio bases were used to guide a non-negative matrix factorisation (NMF) framework for source separation.  ... 
arXiv:2008.09586v2 fatcat:vgdadayysvazfna32f5s43nc6e

Analysis of Facial Information for Healthcare Applications: A Survey on Computer Vision-Based Approaches

Marco Leo, Pierluigi Carcagnì, Pier Luigi Mazzeo, Paolo Spagnolo, Dario Cazzato, Cosimo Distante
2020 Information  
For each facial feature, the computer vision-based tasks aiming at analyzing it and the related healthcare goals that could be pursued are detailed.  ...  A research taxonomy is introduced by dividing the face in its main features: eyes, mouth, muscles, skin, and shape.  ...  Some works approach real-world measurement scenarios with healthy subjects only.  ... 
doi:10.3390/info11030128 fatcat:yx7izg2jlvhsjpppf6ektkmlye

Artificial Intelligence for the Metaverse: A Survey [article]

Thien Huynh-The and Quoc-Viet Pham and Xuan-Qui Pham and Thanh Thi Nguyen and Zhu Han and Dong-Seong Kim
2022 arXiv   pre-print
In this context, metaverse, a term formed by combining meta and universe, has been introduced as a shared virtual world that is fueled by many emerging technologies, such as fifth-generation networks and  ...  In this survey, we make a beneficial effort to explore the role of AI in the foundation and development of the metaverse.  ...  The object of an RL model is how to perform the task to maximize the reward and minimize the penalty, beginning with totally random trials and ending with sophisticated tactics and superhuman skills [  ... 
arXiv:2202.10336v1 fatcat:35isd745dbaqfnpzthnmbaosue

A Survey on Techniques for Enhancing Speech

Tayseer M., Ahsan Adeel, Amir Hussain
2018 International Journal of Computer Applications  
Speech enhancement is used in almost all the modern communication systems.  ...  This paper focuses on the techniques that appeared in the literature to enhance the signal of speech.  ...  The first algorithm incorporates a Multi-Task Learning (MTL) framework. The noisy speech signal is fed as input to the model.  ... 
doi:10.5120/ijca2018916290 fatcat:24ribi6izbat5ggndg7qdame24

Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator [article]

Siddhartha Datta, Konrad Kollnig, Nigel Shadbolt
2022 arXiv   pre-print
The last line of defense against a range of digital harms - including digital distraction, political polarisation through hate speech, and children being exposed to damaging material - is the user interface  ...  This work introduces GreaseTerminator to enable researchers to develop, deploy, and test interventions against these harms with end-users.  ...  Requirements for an intervention assessment framework From the above review, we identify two requirements that our framework, aimed at the assessment of new and existing interventions, should fulfill to  ... 
arXiv:2112.10699v3 fatcat:prfbs5ypd5a4tkhfrmz4hx5dia

Speech Enhancement Using Deep Learning Methods: A Review

Asri Rizki Yuliani, M. Faizal Amri, Endang Suryawati, Ade Ramdan, Hilman Ferdinandus Pardede
2021 Jurnal Elektronika dan Telekomunikasi  
Deep learning has been known to outperform the statistical model used in the conventional speech enhancement. Hence, it deserves a dedicated survey.  ...  Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays an important role in the digital speech signal processing.  ...  SIGNAL MODEL AND PROBLEM FORMULATION In real-world environments, speech signal is easily corrupted by noise.  ... 
doi:10.14203/jet.v21.19-26 fatcat:7ba3lfalzvg35hpc3ws5znseii

A multi-modal perception based assistive robotic system for the elderly

C. Mollaret, A.A. Mekonnen, F. Lerasle, I. Ferrané, J. Pinquier, B. Boudet, P. Rumeau
2016 Computer Vision and Image Understanding  
It is non-intrusive in that it only starts interaction with a user when it detects the user's intention to do so.  ...  In this paper, we present a multi-modal perception based framework to realize a nonintrusive domestic assistive robotic system.  ...  Acknowledgment This work was supported by a grant from the French National Research Agency (ANR) under project RIDDLE with grant number ANR-12-CORD-0003.  ... 
doi:10.1016/j.cviu.2016.03.003 fatcat:7ifl67bnxnfxbofrnvkzeupsj4

Learning-Based Video Game Development in MLP@UoM: An Overview [article]

Ke Chen
2019 arXiv   pre-print
To a large extent, however, video game development is still a laborious yet costly process, and there are many technical challenges ranging from game generation to intelligent agent creation.  ...  Unlike traditional methodologies, in Machine Learning and Perception Lab at the University of Manchester (MLP@UoM), we advocate applying machine learning to different tasks in video game development to  ...  The author is especially grateful to his former PhD students, J. Roberts, P. Shi, H. Rosyid, D. Buckley and W. Woof, for their contributions.  ... 
arXiv:1908.10127v1 fatcat:ocsk6by7c5ap5fgzi64ydxsnby

Common Metrics for Analyzing, Developing and Managing Telecommunication Networks [article]

Salman M. Al-Shehri, Pavel Loskot, Tolga Numanoglu, Mehmet Mert
2017 arXiv   pre-print
Despite their importance, the studies of metrics are usually limited to a narrow area or a well-defined objective.  ...  Our study aims to more broadly survey the metrics that are commonly used for analyzing, developing and managing telecommunication networks in order to facilitate understanding of the current metrics landscape  ...  Different approaches to QoE and particularly a data-driven QoE modeling is analyzed and compared. The objective and subjective speech quality assessment frameworks are surveyed in [111] .  ... 
arXiv:1707.03290v1 fatcat:il7c4343wvcyphxuqaxrmcmaiu

Intelligent Video QoE Prediction Model for Errorprone Networks

P. M. Arun Kumar, S. Chandramathi
2015 Indian Journal of Science and Technology  
Methods: The work deploys Pseudo Subjective Quality Assessment (PSQA) method that involves a hybrid technique in assessing the multimedia quality using WEKA machine learning workbench.  ...  Estimation of video quality in wireless environment requires the conception of better framework and methodologies to improve users' Quality of Experience (QoE).This paper depicts a novel QoE prediction  ...  Hence, the need for non-intrusive QoE framework that considers all the above parameters to work on real time is imperative and calls for decimation of problem space in to practical solutions.  ... 
doi:10.17485/ijst/2015/v8i16/65562 fatcat:cpmypvuydfadlotsuuxl6tdmfe

Industrial and Project Presentations [article]

Felipe A. Lozano, Francisco Serón
2003 Eurographics State of the Art Reports  
This volume contains the Industrial and Project Presentations for the 24th annual Conference of the European Association for Computer Graphics, EUROGRAPHICS´03, held in Granada, Spain, between the 1st  ...  and 6th of September 2003.  ...  We are also grateful to Maritima Valenciana for their support of our research efforts and especially to Eduardo Orellana for his faith in our work.  ... 
doi:10.2312/egid.20031006 fatcat:loh7chebubg2vatd5syfugub24

Verbal Coaching during a Real-Time Task [chapter]

Bruce Roberts, Nicholas J. Pioch, William Ferguson
1998 Lecture Notes in Computer Science  
Terry Allard and Harold Hawkins at ONR deserve special thanks for their continuing support.  ...  Acknowledgements The authors wish to acknowledge the contributions of the entire TRANSoM team: Richard Pew, Yvette Tenney and Jason Vantomme (BBN); Stewart Harris, Barbara Fletcher, and Jason Fritz (Imetrix  ...  Pre-and Post-tests were administered using an ROV. Training and transfer trials were modeled after the Maneuvering Task described earlier.  ... 
doi:10.1007/3-540-68716-5_40 fatcat:btjn56ecffapzkxsuy6lxw2qhe

Experience-Driven Procedural Content Generation

G. N. Yannakakis, J. Togelius
2011 IEEE Transactions on Affective Computing  
The paper provides a taxonomy of PCG algorithms and introduces a framework for PCG driven by computational models of user experience.  ...  Personalization of user experience via affective and cognitive modeling, coupled with real-time adjustment of the content according to user needs and preferences are important steps towards effective and  ...  ACKNOWLEDGEMENTS Thanks to all the participants in the discussions in the Procedural Content Generation Google Group, and the anonymous reviewers of this paper.  ... 
doi:10.1109/t-affc.2011.6 fatcat:734hfj42undy3mqh2dlrwrjxha

Far-Field Automatic Speech Recognition [article]

Reinhold Haeb-Umbach
2020 arXiv   pre-print
A signal enhancement front-end for dereverberation, source separation and acoustic beamforming is employed to clean up the speech, and the back-end ASR engine is robustified by multi-condition training  ...  This tutorial article gives an account of the algorithms used to enable accurate speech recognition from a distance, and it will be seen that, although deep learning has a significant share in the technological  ...  This model is trained with an end-to-end ASR objective (cross entropy given the reference transcriptions v u for utterance u) as follows: θ = argmax θ u log(p(v u |Y u ; θ)), (45) where the model parameters  ... 
arXiv:2009.09395v1 fatcat:7de7w2i5jfhehhtfflu72k35mi
« Previous Showing results 1 — 15 out of 10,136 results