A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes
[article]
2018
arXiv
pre-print
We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for multilingual speech recognition and synthesis. ...
We show that bytes are superior to grapheme characters over a wide variety of languages in monolingual end-to-end speech recognition. ...
Table 1 : 1 Speech recognition performance of monolingual and multilingual with Audio-to-Byte (A2B) or Audio-to-Char (A2C) models. ...
arXiv:1811.09021v1
fatcat:axsm5xwqrva3bgiasy3ttnn5rq
Speech-translation: from domain-limited to domain-unlimited translation tasks
2007
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
have to be (mentally) undone o Speech synthesis o Rather monotone and unstructured o Overlay with original speech o Real-time processing by user is hardly possible Next Grand SLT Challenge: Multilingual ...
How to build a translation system which can be used when you are "On the Run" Speech translator requires real-time translations o CPUs on hand-held devices are usually slow (e.g. 600M Hz) and o There are ...
doi:10.1109/asru.2007.4430141
dblp:conf/asru/Vogel07
fatcat:3ith3heda5dm3nrpou555aji44
Design of Chinese-English Wireless Simultaneous Interpretation System Based on Speech Recognition Technology
2021
International Journal of Antennas and Propagation
Speech recognition technology is used by the system software to create a speech recognition process that properly produces speech-related semantics. ...
A Chinese-English wireless simultaneous interpretation system based on speech recognition technology is suggested to solve the problems of low translation accuracy and a high number of ambiguous terms ...
It can be integrated with other natural language processing technologies such as spoken language recognition, speech synthesis, and machine translation to create more complicated and intelligent applications ...
doi:10.1155/2021/7346984
fatcat:rtyz5iysx5cvdg77tfxkmnzbnu
Development of Multi-lingual Spoken Corpora of Indian Languages
[chapter]
2006
Lecture Notes in Computer Science
An account of care taken to collect speech data that is as close to real world as possible is given. The current status of the programme and the set of actions planned to achieve the goal is given. ...
This paper describes a recently initiated effort for collection and transcription of read as well as spontaneous speech data in four Indian languages. ...
So, developers of real-life speech applications would need to pay attention not only to speech recognition but also to other aspects of spoken dialogue. ...
doi:10.1007/11939993_79
fatcat:4bhyoimb7nd7lcxord75373jkq
D4.1 Report on Multimodal Machine Translation
2018
Zenodo
Both of these tasks are championed by evaluation campaigns, acting as competitions to stimulate research and to serve as a regulated platform investigating evaluation methodologies. ...
In this deliverable, we present a survey of the state of the art in machine translation with an emphasis on multimodal tasks and systems. ...
We would also like to acknowledge the support by NVIDIA and their GPU grant. ...
doi:10.5281/zenodo.3690761
fatcat:n3b34ooubfayxphgyf6bli6bya
Neural Polysynthetic Language Modelling
[article]
2020
arXiv
pre-print
Yet, when considering all of the world's languages, Finnish and Turkish are closer to the average case. ...
Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". ...
In both cases, all of the available data was verse-aligned data drawn from the Bible. For St. Lawrence Island Yupik, we had access to New Testament data only. ...
arXiv:2005.05477v2
fatcat:nzw5w2ueznhpbfocqvlmbalkyi
All Together Now: The Living Audio Dataset
2019
Interspeech 2019
The aim is to provide audio data that is in the public domain, multilingual, and expandable by communities. ...
We discuss the role of linguistic resources, given the success of systems such as Tacotron which use direct text-to-speech mappings, and consider how data provenance could be built into such resources. ...
Recently, powerful machine learning approaches to speech synthesis and speech recognition have called into question the value of linguistic resources such as pronunciation lexicons. ...
doi:10.21437/interspeech.2019-2448
dblp:conf/interspeech/BraudeALASRBBS19
fatcat:sbgcpeiwubgpfcjnoxloihejvq
CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
[article]
2022
arXiv
pre-print
A few audio samples recording on a noisy environment are uploaded and used to adapt the pretrained SE model on the server. ...
Therefore, the proposed BNC can effectively convert the background noise of a speech signal and be a data augmentation method when clean speech signals are unavailable. ...
Specifically, in online testing, the input speech needs to be transferred from byte to float before enhancing and has to be transferred back to the byte before playing by the mobile devices; in offline ...
arXiv:2008.09264v4
fatcat:urtd5veorzeq5hwfyf5o2kfjku
Automatic Language Identification in Texts: A Survey
[article]
2018
arXiv
pre-print
Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI. ...
We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. ...
We would like to thank Kimmo Koskenniemi for many valuable discussions and comments concerning the early phases of the features and the methods sections. ...
arXiv:1804.08186v2
fatcat:4rmixp4i5fb55itb7ze5avkgqy
Automatic Language Identification in Texts: A Survey
2019
The Journal of Artificial Intelligence Research
We describe the features and methods using a unified notation, to make the relationships between methods clearer. ...
Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI. ...
We would like to thank Kimmo Koskenniemi for many valuable discussions and comments concerning the early phases of the features and the methods sections. ...
doi:10.1613/jair.1.11675
fatcat:axugpuogyne3nptvamgd3zwgty
The Spoken Wikipedia Corpus collection: Harvesting, alignment and an application to hyperlistening
2018
Language Resources and Evaluation
Spoken corpora are important for speech research, but are expensive to create and do not necessarily reflect (read or spontaneous) speech 'in the wild'. ...
We turn these semi-structured collections into structured and time-aligned corpora, keeping the exact correspondence with the original hypertext as well as all available metadata. ...
Acknowledgments We would like to thank all Wikipedia authors and speakers for creating this tremendous amount of data. ...
doi:10.1007/s10579-017-9410-y
fatcat:2u4wfkcqknfdxcwwx3wc764tu4
TELEMORPH: BANDWIDTH-DETERMINED MOBILE MULTIMODAL PRESENTATION
2004
Information Technology & Tourism
TeleMorph aims to dynamically generate multimedia presentations using output modalities that are determined by the bandwidth available on a mobile device's wireless connection. ...
This article does not focus on the multimodal content composition but rather concentrates on the motivation for and issues surrounding such intelligent tourist systems. ...
He has written over 90 academic research papers on areas such as distributed computing, emerging trends within wireless ad-hoc networks, dynamic protocol stacks, and mobile systems. ...
doi:10.3727/109830504784531903
fatcat:nw7hesjve5ebzoql7sbwdv6pmq
Automatic Summarization
[chapter]
2014
Natural Language Processing of Semitic Languages
A critical summary of the Gettysburg Address might be: "The Gettsyburg Address, though short, is one of the greatest of all American speeches, with its ending words being especially powerful-'that government ...
The collection may range from gigabytes to bytes, so different methods may be needed for different sizes. ...
He was involved in the development of a large-scale Germanlanguage text summarization system (Topic) and has written or coauthored four books and more than 140 journal articles, contributions to collected ...
doi:10.1007/978-3-642-45358-8_12
dblp:series/tanlp/BelguithEMJJB14
fatcat:zxihwcr2azdjlnio53cgrqk6hy
Automatic Summarization
2011
Foundations and Trends in Information Retrieval
A critical summary of the Gettysburg Address might be: "The Gettsyburg Address, though short, is one of the greatest of all American speeches, with its ending words being especially powerful-'that government ...
The collection may range from gigabytes to bytes, so different methods may be needed for different sizes. ...
He was involved in the development of a large-scale Germanlanguage text summarization system (Topic) and has written or coauthored four books and more than 140 journal articles, contributions to collected ...
doi:10.1561/1500000015
fatcat:gfli2ecy55a2dkwleu5b522au4
The challenges of automatic summarization
2000
Computer
A critical summary of the Gettysburg Address might be: "The Gettsyburg Address, though short, is one of the greatest of all American speeches, with its ending words being especially powerful-'that government ...
The collection may range from gigabytes to bytes, so different methods may be needed for different sizes. ...
He was involved in the development of a large-scale Germanlanguage text summarization system (Topic) and has written or coauthored four books and more than 140 journal articles, contributions to collected ...
doi:10.1109/2.881692
fatcat:tddpln5jsrfhdplkbpf64yqvhu
« Previous
Showing results 1 — 15 out of 120 results