Filters








113,218 Hits in 4.7 sec

Self-Training for End-to-End Speech Translation [article]

Juan Pino and Qiantong Xu and Xutai Ma and Mohammad Javad Dousti and Yun Tang
2020 arXiv   pre-print
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model.  ...  Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.  ...  An end-to-end speech translation model is then trained on the resulting data. [11] demonstrates the effectiveness of self-training for machine translation and summarization.  ... 
arXiv:2006.02490v2 fatcat:opq3fnq6hfbs7ohvs66hekrwui

Self-Training for End-to-End Speech Translation

Juan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang
2020 Interspeech 2020  
One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model.  ...  Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.  ...  An end-to-end speech translation model is then trained on the resulting data. [11] demonstrates the effectiveness of self-training for machine translation and summarization.  ... 
doi:10.21437/interspeech.2020-2938 dblp:conf/interspeech/PinoXMDT20 fatcat:yvxp2r2zsbceth76zv344yn5ay

Investigating Self-Supervised Pre-Training for End-to-End Speech Translation

Ha Nguyen, Fethi Bougares, N. Tomashenko, Yannick Estève, Laurent Besacier
2020 Interspeech 2020  
Index Terms: self-supervised learning from speech, automatic speech translation, end-to-end models, low resource settings.  ...  We investigate here its impact on end-to-end automatic speech translation (AST) performance.  ...  We investigate the possibility to leverage unlabeled speech for endto-end automatic speech translation (AST).  ... 
doi:10.21437/interspeech.2020-1835 dblp:conf/interspeech/NguyenBTEB20 fatcat:c7v3pm4uqrd4nfhdfpzwz3ipdm

End-to-end Speech Translation System Description of LIT for IWSLT 2019

Mei Tu, Wei Liu, Lijie Wang, Xiao Chen, Xue Wen
2019 Zenodo  
We propose layer-tied self-attention for end-to-end speech translation. Our method takes advantage of sharing weights of speech encoder and text decoder.  ...  This paper describes our end-to-end speech translation system for the speech translation task of lectures and TED talks from English to German for IWSLT Evaluation 2019.  ...  Chen, "Sequence-to-sequence models can directly translate foreign speech," in Proc. Interspeech, 2017  ... 
doi:10.5281/zenodo.3525548 fatcat:cfztpy6khzbcjluvqc5wxppi5q

Cascaded Models With Cyclic Feedback For Direct Speech Translation [article]

Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
2021 arXiv   pre-print
A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English  ...  After pre-training MT and ASR, we use a feedback cycle where the downstream performance of the MT system is used as a signal to improve the ASR system by self-training, and the MT component is fine-tuned  ...  outputs for self-training.  ... 
arXiv:2010.11153v2 fatcat:6shsrexnozbajkzrsqvtx2hati

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding [article]

Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong
2019 arXiv   pre-print
Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR).  ...  Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years.  ...  The research work in this paper has also been supported by Beijing Advanced Innovation Center for Language Resources.  ... 
arXiv:1912.07240v1 fatcat:vj6567pobvhf5j2iyyneypxvya

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR).  ...  Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years.  ...  It is more difficult for the end-to-end ST model to generate a correct translation and its output is totally wrong.  ... 
doi:10.1609/aaai.v34i05.6360 fatcat:ekymikvcp5g43bszq74b6aft4m

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, Marco Turchi
2019 Zenodo  
This paper describes FBK's submission to the end-to-end speech translation (ST) task at IWSLT 2019.  ...  On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR).  ...  We thank Mauro Cettolo for the useful technical conversations. References  ... 
doi:10.5281/zenodo.3525492 fatcat:yvmfqs3gqrainc2gno4f5eoyhe

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021 [article]

Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai
2021 arXiv   pre-print
This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task.  ...  Compared to last year's optimal systems, our S2T simultaneous translation system improves by an average of 11.3 BLEU for all latency regimes, and our T2T simultaneous translation system improves by an  ...  Speech-to-Text Simultaneous Translation End-to-End Systems The main system of End-to-End Speech-to-Text simultaneous Translation is based on the aforementioned CAAT structure.  ... 
arXiv:2107.00279v2 fatcat:dm754bg6nnhsxn5dirwoqbdrdm

End-to-end Speech Translation via Cross-modal Progressive Training [article]

Rong Ye, Mingxuan Wang, Lei Li
2021 arXiv   pre-print
In this paper, we propose Cross Speech-Text Network (XSTNet), an end-to-end model for speech-to-text translation.  ...  End-to-end speech translation models have become a new trend in research due to their potential of reducing error propagation. However, these models still suffer from the challenge of data scarcity.  ...  This motivates us to design a multi-task model. In this paper, we designed Cross Speech-Text Network (XSTNet) for end-to-end ST to joint train ST, ASR and MT tasks.  ... 
arXiv:2104.10380v2 fatcat:ewasab4d5nanpnlymgh3jhr274

fairseq S2T: Fast Speech-to-Text Modeling with fairseq [article]

Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino
2020 arXiv   pre-print
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.  ...  It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference.  ...  Tied multitask learning for neural speech translation.  ... 
arXiv:2010.05171v1 fatcat:tcdojkewtjfyhghjsbogbpgopq

End-to-End Speech Translation with Knowledge Distillation [article]

Yuchen Liu, Hao Xiong, Zhongjun He, Jiajun Zhang, Hua Wu, Haifeng Wang, Chengqing Zong
2019 arXiv   pre-print
End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years.  ...  Specifically, we first train a text translation model, regarded as a teacher model, and then ST model is trained to learn output probabilities from teacher model through knowledge distillation.  ...  [8] give the first proof of the potential for end-to-end speech-to-text translation without using source language.  ... 
arXiv:1904.08075v1 fatcat:2ljzutivgrdfzpemjrqzcebdh4

Self-Supervised Representations Improve End-to-End Speech Translation [article]

Anne Wu, Changhan Wang, Juan Pino, Jiatao Gu
2020 arXiv   pre-print
End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity.  ...  languages, and whether they can be effectively combined with other common methods that help improve low-resource end-to-end speech translation such as using a pre-trained high-resource speech recognition  ...  Introduction Recently, there has been much interest in end-to-end speech translation (ST) models [1, 2, 3, 4, 5, 6, 7] , which, compared to traditional cascaded models, are simpler and computationally  ... 
arXiv:2006.12124v2 fatcat:cs2d5q4cprfodi46s7hab5oroi

End-to-End Speech Translation with Knowledge Distillation

Yuchen Liu, Hao Xiong, Jiajun Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong
2019 Interspeech 2019  
End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years.  ...  Specifically, we first train a text translation model, regarded as the teacher model, and then ST model is trained to learn the output probabilities of teacher model through knowledge distillation.  ...  Acknowledgements We thank anonymous reviewers for helpful feedbacks.  ... 
doi:10.21437/interspeech.2019-2582 dblp:conf/interspeech/LiuXZHWWZ19 fatcat:vjsqn4tldverxlzegdllautdzi

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation [article]

Renjie Zheng and Junkun Chen and Mingbo Ma and Liang Huang
2021 arXiv   pre-print
Within this cross-modal representation learning framework, we further present an end-to-end model for Fused Acoustic and Text Speech Translation (FAT-ST).  ...  However, all existing methods suffer from two limitations: (a) they only learn from one input modality, while a unified representation for both speech and text is needed by tasks such as end-to-end speech  ...  Acknowledgements We thank Kenneth Church and Jiahong Yuan for discussions, and Juneki Hong for proofreading, and the anonymous reviewers for suggestions.  ... 
arXiv:2102.05766v2 fatcat:gc5aplktyjcwrj27rei4q7qnze
« Previous Showing results 1 — 15 out of 113,218 results