Filters








5 Hits in 3.3 sec

Single-Microphone Speech Enhancement and Separation Using Deep Learning [article]

Morten Kolbæk
2018 arXiv   pre-print
Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome.  ...  We show that performance of such algorithms is closely linked to the training data, and good generalizability can be achieved with carefully designed training data.  ...  Conclusion In this paper we proposed a Speech Enhancement (SE) system based on Deep Neural Networks (DNNs) that optimizes an approximation of the Short-Time Objective Intelligibility (STOI) estimator.  ... 
arXiv:1808.10620v2 fatcat:kzk357xdbjcsfn5c75qe4cb65q

Development of a New Image-to-text Conversion System for Pashto, Farsi and Traditional Chinese [article]

Marek Rychlik, Dwight Nwaigwe and Yan Han and Dylan Murphy
2020 arXiv   pre-print
The target audience of this paper is a general audience with interest in Digital Humanities or in retrieval of accurate full-text and metadata from digital images.  ...  Hence, CTC solves one of the principal difficulties in using a neural network (especially, a recurrent neural network, or RNN) in OCR: the problem of segmentation.  ...  bring low-quality (and sometimes unusable) results.  ... 
arXiv:2005.08650v1 fatcat:3nmbzaz72vgwnab2ts7iz6ugly

Streaming Multi-Talker ASR with Token-Level Serialized Output Training [article]

Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka
2022 arXiv   pre-print
Moreover, in our experiments with LibriSpeechMix and LibriCSS datasets, the t-SOT-based transformer transducer model achieves the state-of-the-art word error rates by a significant margin to the prior  ...  For non-overlapping speech, the t-SOT model is on par with a single-talker ASR model in terms of both accuracy and computational cost, opening the door for deploying one model for both single- and multi-talker  ...  A popular approach is to use a neural network that has multiple output branches to generate transcriptions for overlapping speakers (e.g., [10, 11, 12, 13, 14] ), where the model is often trained with  ... 
arXiv:2202.00842v4 fatcat:en65edtxmvax3dn6fakga4n7zq

A survey of joint intent detection and slot-filling models in natural language understanding [article]

H. Weld, X. Huang, S. Long, J. Poon, S. C. Han
2021 arXiv   pre-print
We observe three milestones in this research so far: Intent detection to identify the speaker's intention, slot filling to label each word token in the speech/text, and finally, joint intent classification  ...  Recursive neural networks.  ...  In 2015 the first completely neural network was devised, using a recurrent neural network (RNN, different to a recursive neural network) embedding of words, CNN representation of sentences, and a feed  ... 
arXiv:2101.08091v3 fatcat:ai6w2imilrfupf4m5fm2rjtzxi

Functional Representation of Prototypes in LVQ and Relevance Learning [chapter]

Friedrich Melchert, Udo Seiffert, Michael Biehl
2016 Advances in Intelligent Systems and Computing  
van Harmelen (Vrije Universiteit Amsterdam), Hado van Hasselt (Google DeepMind), and Manuela Veloso (Carnegie Mellon University), a Research meets Business session, a panel discussion on Social Robots, with  ...  Acknowledgments The research reported has been performed in the context of the project 'Designing and Understanding Forensic Bayesian Networks with Arguments and Scenarios', funded in the NWO Forensic  ...  This work sheds light on research methods applicable for detecting online radicalism within the Reddit domain.  ... 
doi:10.1007/978-3-319-28518-4_28 fatcat:uwxvq6txmrba3ajulmblafgh2a