Filters








56,379 Hits in 3.9 sec

Direct speech-to-speech translation with discrete units [article]

Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu
2022 arXiv   pre-print
We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequence-to-sequence speech-to-unit translation (S2UT) model to predict the discrete  ...  We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.  ...  Acknowledgement We would like to thank Jade Copet, Emmanuel Dupoux, Evgeny Kharitonov, Kushal Lakhotia, Abdelrahman Mohamed, Tu Anh Nguyen and Morgane Rivière for helpful discussions on discrete representations  ... 
arXiv:2107.05604v2 fatcat:giy5e3srmnbp7hbhagcs5ep6wi

Direct Speech-to-Speech Translation With Discrete Units

Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu
2022 Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)   unpublished
We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequenceto-sequence speech-to-unit translation (S2UT) model to predict the discrete  ...  We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.  ...  Acknowledgement We would like to thank Jade Copet, Emmanuel Dupoux, Evgeny Kharitonov, Kushal Lakhotia, Abdelrahman Mohamed, Tu Anh Nguyen and Morgane Rivière for helpful discussions on discrete representations  ... 
doi:10.18653/v1/2022.acl-long.235 fatcat:fburxvr55fa6pjwkhg7pqgxxty

Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention [article]

Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino
2022 arXiv   pre-print
Our approach leverages recent progress on direct speech-to-speech translation with discrete units, in which a sequence of discrete representations, instead of continuous spectrogram features, learned in  ...  We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.  ...  In this work, we propose the first direct simultaneous speech-to-speech translation model, based on recent progress on speech-to-units (S2U) translation (Lee et al., 2021) .  ... 
arXiv:2110.08250v2 fatcat:wcaduwxmc5h2bboh5yeaqqcena

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation [article]

Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
2022 arXiv   pre-print
techniques that work well for speech-to-text translation (S2T) to the S2UT domain by studying both speech encoder and discrete unit decoder pre-training.  ...  We take advantage of a recently proposed speech-to-unit translation (S2UT) framework that encodes target speech into discrete representations, and transfer pre-training and efficient partial finetuning  ...  Most recently, [4] proposes to apply a self-supervised speech encoder pre-trained on unlabeled speech to convert target speech into discrete units [9] and build a speech-to-unit translation (S2UT)  ... 
arXiv:2204.02967v1 fatcat:xql7kp3lyjfh3ffprbiy5g4dom

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [article]

Rongjie Huang, Zhou Zhao, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He
2022 arXiv   pre-print
Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner,  ...  In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation.  ...  model HuBERT [17] tuned with bilateral perturbation (BiP) for learning discrete representations (units) of target speech; 2) build the sequence-to-sequence model TranSpeech for speech-to-unit translation  ... 
arXiv:2205.12523v1 fatcat:akzsdagrubce7b7ie56yfmm6dm

UWSpeech: Speech to Speech Translation for Unwritten Languages [article]

Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu
2020 arXiv   pre-print
speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter.  ...  to target speech with target text for auxiliary training.  ...  As can be seen, Direct Translation achieves very low BLEU score, which is consistent with the findings in [15] and demonstrates the difficulty of direct speech to speech translation.  ... 
arXiv:2006.07926v2 fatcat:5q4flanbzzdwfjlvjyi5vqcrxu

Textless Speech-to-Speech Translation on Real Data [article]

Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu
2022 arXiv   pre-print
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data.  ...  The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker  ...  Acknowledgements The authors would like to thank Adam Polyak and Felix Kreuk for initial discussions on accent normalization.  ... 
arXiv:2112.08352v2 fatcat:clu34adr7je45p5rwu5zhno7ci

Speech-to-speech Translation between Untranscribed Unknown Languages [article]

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
2019 arXiv   pre-print
To the best of our knowledge, this is the first work that performed pure speech-to-speech translation between untranscribed unknown languages.  ...  In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision.  ...  Despite much progress in direct speech translation research, no completely direct speech-to-speech translation has been achieved without any text transcription in source and target languages, during training  ... 
arXiv:1910.00795v2 fatcat:hi57dybnjrfbjcbq2ejoick2eq

CONTINUITY AND DISCRETENESS OF SIMULTANEOUS INTERPRETING

Olimova Dilfuza Zokirovna
2022 Zenodo  
Simultaneous translation dialectically combines two opposite features: continuity and discreteness.  ...  Usually, the next step of the translator when orienting in the speech of the speaker is to orient in the next intonation-semantic unit, structural-syntactic block or other segment.  ...  Results of this step orientation process allows you to proceed to the next step within the process of searching and making translation decisions, which, in turn, allows you to proceed to the direct generation  ... 
doi:10.5281/zenodo.6600663 fatcat:joxlr65rnvh4vkgunrl6gly7ie

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings [article]

Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier
2022 arXiv   pre-print
These discretization models are trained using raw speech only, producing discrete speech units that can be applied for downstream (text-based) tasks.  ...  In this paper we compare five of these models: three Bayesian and two neural approaches, with regards to the exploitability of the produced units for UWS.  ...  recognition and speech translation.  ... 
arXiv:2106.04298v2 fatcat:rabgx7fvrfbtdccdae5osqzo6e

Textless Speech Emotion Conversion using Discrete and Decomposed Representations [article]

Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
2022 arXiv   pre-print
First, we modify the speech content by translating the phonetic-content units to a target emotion, and then predict the prosodic features based on these units.  ...  We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.  ...  Unit Translation. To translate the speech content units, we use a sequence-to-sequence Transformer model (Vaswani et al., 2017) denoted by E s2s .  ... 
arXiv:2111.07402v2 fatcat:iss4f2nqufdlrdqxq4vzydp6zq

Automatic steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation

T. Nishiura, R. Gruhn, S. Nakamura
2001 IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.  
It is very important for multi-lingual tele-conferencing through speech-to-speech translation to capture distant-talking speech with high quality.  ...  We also confirmed that the translated speech and the speaker image can be shown immediately after accurately steering the microphone array and the video camera in the speaker direction and translating  ...  Seiichi Yamamoto, president of ATR Spoken Language Translation Research Laboratories, for giving us the opportunity to carry out this research.  ... 
doi:10.1109/icme.2001.1237753 dblp:conf/icmcs/NishiuraGN01 fatcat:yi5xtqiwmjblbkf7huhbqpyare

Speech To Speech Translation: Challenges and Future

Sandeep Dhawan
2022 International Journal of Computer Applications Technology and Research  
Speech-to-speech translation is one such system that can be useful in facilitating communication between people who speak different languages.  ...  This paper describes a significant international and inter-institutional effort in this direction, highlighting the current challenges being faced as well as the future of the technology of Speech-to speech  ...  The ATR-MATRIX architecture is an exemplification of the direct translation approach [10], as it employs a cascade of a speech recognizer with a direct translation algorithm, TDMT, whose produced text  ... 
doi:10.7753/ijcatr1103.1001 fatcat:a6bvo2xjjvhk7gobaxa6kptcyy

The reality of phonological forms: a rejoinder

Robert F. Port
2010 Language Sciences  
with traditional speech error data is that, as I read the literature, speech errors do not consistently support segments as the unit of analysis.  ...  We only need to hear language spoken to understand it. And as speech perceivers, we do not have direct access to the speaker's articulation.  ... 
doi:10.1016/j.langsci.2009.10.016 fatcat:rwisqa3ulneuplflc5r7m76ud4

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are [article]

Matthias Sperber, Matthias Paulik
2020 arXiv   pre-print
Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring  ...  This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the speech  ...  with non-discrete IRs, can be trained without resorting to any end-to-end data for the particular language pair of interest.  ... 
arXiv:2004.06358v1 fatcat:buutv3udv5bthjhe5lsnioq63i
« Previous Showing results 1 — 15 out of 56,379 results