Filters








10,404 Hits in 2.7 sec

From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation [article]

Danni Liu, Changhan Wang, Hongyu Gong, Xutai Ma, Yun Tang, Juan Pino
2022 arXiv   pre-print
Speech-to-speech translation (S2ST) converts input speech to speech in another language.  ...  In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer.  ...  Conclusion In this work, we improve speech-to-speech translation pipelines for simultaneous speech-to-speech translation.  ... 
arXiv:2110.08214v3 fatcat:u4cf4k5y7bgnpelees5czcsi4y

Towards simultaneous interpreting: the timing of incremental machine translation and speech synthesis

Timo Baumann, Srinivas Bangalore, Julia Hirschberg
2014 International Workshop on Spoken Language Translation  
Of course, both incremental understanding and translation by humans can be garden-pathed, although experts are able to optimize their delivery so as to balance the goals of minimal latency, translation  ...  They commence the target utterance in the hope that they will be able to finish understanding the source speaker's message and determine its translation in time for the unfolding delivery.  ...  Acknowledgements The authors would like to thank Marcela Charfuelan for making available her MaryTTS extensions for Spanish speech synthesis, as well as the valuable feedback by the anonymous reviewers  ... 
dblp:conf/iwslt/BaumannBH14 fatcat:34jsbdjcobh6tjjlbkah65r2x4

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS [article]

Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
2020 arXiv   pre-print
The system consists of three fully-incremental neural processing modules for automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS).  ...  This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation.  ...  Our system is based on the cascade of three processing modules: incremental speech recognition (ISR), incremental machine translation (IMT), and text-to-speech synthesis (ITTS), rather than recent end-to-end  ... 
arXiv:2011.04845v2 fatcat:mwfqfz2bozaafgg2rzikggj3n4

Real-time Incremental Speech-to-Speech Translation of Dialogs

Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar, Prakash Kolan, Ladan Golipour, Aura Jimenez
2012 North American Chapter of the Association for Computational Linguistics  
In this work, we address the problem of incremental speech-to-speech translation (S2S) that enables cross-lingual communication between two remote participants over a telephone.  ...  The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition.  ...  We are also exploring new algorithms for performing reordering aware incremental speech-to-speech translation, i.e., translating source phrases such that text-to-speech synthesis can be rendered incrementally  ... 
dblp:conf/naacl/BangaloreSKGJ12 fatcat:fdb3dbrx7vebtkcdafsjnnc3ca

An Investigation into Methodology and Metrics Employed to Evaluate the (Speech-to-Speech) Way in Translation Systems

Parnyan Bahrami Dashtaki
2017 Modern Applied Science  
The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis.  ...  The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition.  ...  This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system.  ... 
doi:10.5539/mas.v11n4p55 fatcat:r365jxnz5bdgxhfo6fexkhorxm

Retico: An incremental framework for spoken dialogue systems

Thilo Michael
2020 SIGDIAL Conferences  
In this demo, we present three example systems that are implemented in retico: a spoken translation tool that translates speech in real-time, a conversation simulation that models turn-taking, and a spoken  ...  In this paper, we present the newest version of retico -a python-based incremental dialogue framework to create state-of-the-art spoken dialogue systems and simulations.  ...  Spoken Translation Service The translation service utilizes speech recognition, a text translation service, and speech synthesis to translate sentences spoken into the system.  ... 
dblp:conf/sigdial/Michael20 fatcat:mo5q4dskr5ddvc53seoxvefvt4

Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention [article]

Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino
2022 arXiv   pre-print
an unsupervised manner, are predicted from the model and passed directly to a vocoder for speech synthesis on-the-fly.  ...  We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.  ...  passed to a vocoder for target speech synthesis.  ... 
arXiv:2110.08250v2 fatcat:wcaduwxmc5h2bboh5yeaqqcena

Towards Machine Speech-to-speech Translation

Satoshi Nakamura, Katsuhito Sudoh, Sakriani Sakti
2020 Tradumàtica tecnologies de la traducció  
The S2ST system is basically composed of three modules: large vocabulary continuous automatic speech recognition (ASR), machine text-to-text translation (MT) and textto-speech synthesis (TTS).  ...  All these modules need to be multilingual in nature and thus require multilingual speech and corpora for training models.  ...  translation, MT) i la conversió de text a veu (Text-to-Speech Synthesis, TTS).  ... 
doi:10.5565/rev/tradumatica.238 fatcat:n54mfznx5vfxnemrxjpoomhnaq

Incremental TTS for Japanese Language

Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
2018 Interspeech 2018  
Simultaneous lecture translation requires speech to be translated in real time before the speaker has spoken an entire sentence since a long delay will create difficulties for the listeners trying to follow  ...  The challenge is to construct a full-fledged system with speech recognition, machine translation, and textto-speech synthesis (TTS) components that could produce highquality speech translations on the  ...  One way is to construct an automatic speech-to-speech translation system, which consists of three components: automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis  ... 
doi:10.21437/interspeech.2018-1561 dblp:conf/interspeech/YanagitaS018 fatcat:w3umgk77sncqdchrdjveok6hui

INPRO_iSS: A Component for Just-In-Time Incremental Speech Synthesis

Timo Baumann, David Schlangen
2012 Annual Meeting of the Association for Computational Linguistics  
We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities.  ...  This component can be used to increase the responsivity and naturalness of spoken interactive systems.  ...  A use case with a similar (but probably lower) level of incrementality could be simultaneous speech-to-speech translation, or type-to-speech for people with speech disabilities.  ... 
dblp:conf/acl/BaumannS12 fatcat:ppynd2xohfcu3gquqsbwhyze7y

Speech-to-Speech Translation: A Review

Mahak Dureja, Sumanlata Gautam
2015 International Journal of Computer Applications  
This paper includes the major speech translation projects using different approaches for speech recognition, translation and text to speech synthesis highlighting the major pros and cons for the approach  ...  Speech-to-Speech Translation is a three step software process which includes Automatic speech Recognition, Machine Translation and voice synthesis.  ...  It provides concept-to-speech synthesis and for the deep processing stream, whereas it operates more likely to traditional text-to-speech system that results in a lower quality of the output for the shallow  ... 
doi:10.5120/ijca2015907079 fatcat:lqprqaeqxfb5zcfz2hzdhqyhdq

Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

Sashi NOVITASARI, Sakriani SAKTI, Satoshi NAKAMURA
2021 IEICE transactions on information and systems  
Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-tospeech synthesis) modules  ...  Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time.  ...  S2ST systems commonly consist of three components: automatic speech recognition (ASR) system, machine translation (MT) system, and text-to-speech synthesis (TTS) system.  ... 
doi:10.1587/transinf.2021edp7014 fatcat:4xxttvurvncw5lcbuyyjld3vqe

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model [article]

Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
2021 arXiv   pre-print
This letter presents an incremental text-to-speech (TTS) method that performs synthesis in small linguistic units while maintaining the naturalness of output speech.  ...  Incremental TTS is generally subject to a trade-off between latency and synthetic speech quality.  ...  It consists of three modules that perform incremental processing: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS).  ... 
arXiv:2012.12612v2 fatcat:rhpy4jho6rd2bc665sm6zircui

Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time [article]

Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
2020 arXiv   pre-print
Inspired by a human speech chain mechanism, a machine speech chain framework based on deep learning was recently proposed for the semi-supervised development of automatic speech recognition (ASR) and text-to-speech  ...  synthesis TTS) systems.  ...  Seq2seq Incremental Text-to-Speech Synthesis System Seq2seq ITTS performs speech generation without waiting for a complete sentence text input [27, 28] .  ... 
arXiv:2011.02126v1 fatcat:b7mwq7st55bwrfgz2q4jucuhwm

Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time

Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
2020 Interspeech 2020  
Inspired by a human speech chain mechanism, a machine speech chain framework based on deep learning was recently proposed for the semi-supervised development of automatic speech recognition (ASR) and text-to-speech  ...  synthesis (TTS) systems.  ...  Seq2seq Incremental Text-to-Speech Synthesis System Seq2seq ITTS performs speech generation without waiting for a complete sentence text input [27, 28] .  ... 
doi:10.21437/interspeech.2020-2034 dblp:conf/interspeech/NovitasariTYS020 fatcat:pewtphmmzjcmtk3h6kcfam7idq
« Previous Showing results 1 — 15 out of 10,404 results