A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Direct speech-to-speech translation with discrete units
[article]
2022
arXiv
pre-print
We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequence-to-sequence speech-to-unit translation (S2UT) model to predict the discrete ...
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. ...
Acknowledgement We would like to thank Jade Copet, Emmanuel Dupoux, Evgeny Kharitonov, Kushal Lakhotia, Abdelrahman Mohamed, Tu Anh Nguyen and Morgane Rivière for helpful discussions on discrete representations ...
arXiv:2107.05604v2
fatcat:giy5e3srmnbp7hbhagcs5ep6wi
Direct Speech-to-Speech Translation With Discrete Units
2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
unpublished
We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequenceto-sequence speech-to-unit translation (S2UT) model to predict the discrete ...
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. ...
Acknowledgement We would like to thank Jade Copet, Emmanuel Dupoux, Evgeny Kharitonov, Kushal Lakhotia, Abdelrahman Mohamed, Tu Anh Nguyen and Morgane Rivière for helpful discussions on discrete representations ...
doi:10.18653/v1/2022.acl-long.235
fatcat:fburxvr55fa6pjwkhg7pqgxxty
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention
[article]
2022
arXiv
pre-print
Our approach leverages recent progress on direct speech-to-speech translation with discrete units, in which a sequence of discrete representations, instead of continuous spectrogram features, learned in ...
We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations. ...
In this work, we propose the first direct simultaneous speech-to-speech translation model, based on recent progress on speech-to-units (S2U) translation (Lee et al., 2021) . ...
arXiv:2110.08250v2
fatcat:wcaduwxmc5h2bboh5yeaqqcena
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
[article]
2022
arXiv
pre-print
techniques that work well for speech-to-text translation (S2T) to the S2UT domain by studying both speech encoder and discrete unit decoder pre-training. ...
We take advantage of a recently proposed speech-to-unit translation (S2UT) framework that encodes target speech into discrete representations, and transfer pre-training and efficient partial finetuning ...
Most recently, [4] proposes to apply a self-supervised speech encoder pre-trained on unlabeled speech to convert target speech into discrete units [9] and build a speech-to-unit translation (S2UT) ...
arXiv:2204.02967v1
fatcat:xql7kp3lyjfh3ffprbiy5g4dom
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
[article]
2022
arXiv
pre-print
Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner, ...
In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. ...
model HuBERT [17] tuned with bilateral perturbation (BiP) for learning discrete representations (units) of target speech; 2) build the sequence-to-sequence model TranSpeech for speech-to-unit translation ...
arXiv:2205.12523v1
fatcat:akzsdagrubce7b7ie56yfmm6dm
UWSpeech: Speech to Speech Translation for Unwritten Languages
[article]
2020
arXiv
pre-print
speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. ...
to target speech with target text for auxiliary training. ...
As can be seen, Direct Translation achieves very low BLEU score, which is consistent with the findings in [15] and demonstrates the difficulty of direct speech to speech translation. ...
arXiv:2006.07926v2
fatcat:5q4flanbzzdwfjlvjyi5vqcrxu
Textless Speech-to-Speech Translation on Real Data
[article]
2022
arXiv
pre-print
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. ...
The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker ...
Acknowledgements The authors would like to thank Adam Polyak and Felix Kreuk for initial discussions on accent normalization. ...
arXiv:2112.08352v2
fatcat:clu34adr7je45p5rwu5zhno7ci
Speech-to-speech Translation between Untranscribed Unknown Languages
[article]
2019
arXiv
pre-print
To the best of our knowledge, this is the first work that performed pure speech-to-speech translation between untranscribed unknown languages. ...
In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision. ...
Despite much progress in direct speech translation research, no completely direct speech-to-speech translation has been achieved without any text transcription in source and target languages, during training ...
arXiv:1910.00795v2
fatcat:hi57dybnjrfbjcbq2ejoick2eq
CONTINUITY AND DISCRETENESS OF SIMULTANEOUS INTERPRETING
2022
Zenodo
Simultaneous translation dialectically combines two opposite features: continuity and discreteness. ...
Usually, the next step of the translator when orienting in the speech of the speaker is to orient in the next intonation-semantic unit, structural-syntactic block or other segment. ...
Results of this step orientation process allows you to proceed to the next step within the process of searching and making translation decisions, which, in turn, allows you to proceed to the direct generation ...
doi:10.5281/zenodo.6600663
fatcat:joxlr65rnvh4vkgunrl6gly7ie
Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings
[article]
2022
arXiv
pre-print
These discretization models are trained using raw speech only, producing discrete speech units that can be applied for downstream (text-based) tasks. ...
In this paper we compare five of these models: three Bayesian and two neural approaches, with regards to the exploitability of the produced units for UWS. ...
recognition and speech translation. ...
arXiv:2106.04298v2
fatcat:rabgx7fvrfbtdccdae5osqzo6e
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
[article]
2022
arXiv
pre-print
First, we modify the speech content by translating the phonetic-content units to a target emotion, and then predict the prosodic features based on these units. ...
We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion. ...
Unit Translation. To translate the speech content units, we use a sequence-to-sequence Transformer model (Vaswani et al., 2017) denoted by E s2s . ...
arXiv:2111.07402v2
fatcat:iss4f2nqufdlrdqxq4vzydp6zq
Automatic steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation
2001
IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.
It is very important for multi-lingual tele-conferencing through speech-to-speech translation to capture distant-talking speech with high quality. ...
We also confirmed that the translated speech and the speaker image can be shown immediately after accurately steering the microphone array and the video camera in the speaker direction and translating ...
Seiichi Yamamoto, president of ATR Spoken Language Translation Research Laboratories, for giving us the opportunity to carry out this research. ...
doi:10.1109/icme.2001.1237753
dblp:conf/icmcs/NishiuraGN01
fatcat:yi5xtqiwmjblbkf7huhbqpyare
Speech To Speech Translation: Challenges and Future
2022
International Journal of Computer Applications Technology and Research
Speech-to-speech translation is one such system that can be useful in facilitating communication between people who speak different languages. ...
This paper describes a significant international and inter-institutional effort in this direction, highlighting the current challenges being faced as well as the future of the technology of Speech-to speech ...
The ATR-MATRIX architecture is an exemplification of the direct translation approach [10], as it employs a cascade of a speech recognizer with a direct translation algorithm, TDMT, whose produced text ...
doi:10.7753/ijcatr1103.1001
fatcat:a6bvo2xjjvhk7gobaxa6kptcyy
The reality of phonological forms: a rejoinder
2010
Language Sciences
with traditional speech error data is that, as I read the literature, speech errors do not consistently support segments as the unit of analysis. ...
We only need to hear language spoken to understand it. And as speech perceivers, we do not have direct access to the speaker's articulation. ...
doi:10.1016/j.langsci.2009.10.016
fatcat:rwisqa3ulneuplflc5r7m76ud4
Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
[article]
2020
arXiv
pre-print
Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring ...
This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the speech ...
with non-discrete IRs, can be trained without resorting to any end-to-end data for the particular language pair of interest. ...
arXiv:2004.06358v1
fatcat:buutv3udv5bthjhe5lsnioq63i
« Previous
Showing results 1 — 15 out of 56,379 results