7,563 Hits in 3.9 sec

Streaming Simultaneous Speech Translation with Augmented Memory Transformer [article]

Xutai Ma, Yongqiang Wang, Mohammad Javad Dousti, Philipp Koehn, Juan Pino
2020 arXiv   pre-print
We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder, which has shown great success on the streaming automatic speech recognition  ...  In this paper, we focus on the task of streaming simultaneous speech translation, where the systems are not only capable of translating with partial input but are also able to handle very long or continuous  ...  We first define the evaluation method for simultaneous speech translation. We then introduce the model based on the augmented memory transformer.  ... 
arXiv:2011.00033v1 fatcat:i5jzmqvhmnhpthl62u4tok4c4m

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021 [article]

Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai
2021 arXiv   pre-print
This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task.  ...  Based on CAAT architecture and data augmentation, we build S2T and T2T simultaneous translation systems in this evaluation campaign.  ...  For S2T task, input speech is simply segmented into utterances with duration of 20 seconds and each segmented piece is directly sent to our simultaneous translation systems to obtain the streaming results  ... 
arXiv:2107.00279v2 fatcat:dm754bg6nnhsxn5dirwoqbdrdm

Learning Shared Semantic Space for Speech-to-Text Translation [article]

Chi Han, Mingxuan Wang, Heng Ji, Lei Li
2021 arXiv   pre-print
Having numerous potential applications and great impact, end-to-end speech translation (ST) has long been treated as an independent task, failing to fully draw strength from the rapid advances of its sibling  ...  - text machine translation (MT).  ...  It has many real-world applications, including automatic video captioning, simultaneous translation for international conferences, etc.  ... 
arXiv:2105.03095v3 fatcat:f5tlgfrc3ng6dfjeby6ui6yd3i

Re-translation versus Streaming for Simultaneous Translation [article]

Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, George Foster
2020 arXiv   pre-print
There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available.  ...  We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming  ...  Introduction In simultaneous machine translation, the goal is to translate an incoming stream of source words with as low latency as possible.  ... 
arXiv:2004.03643v3 fatcat:us7zafhhijal5nmdqaqmjuhnny

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 [article]

Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier
2020 arXiv   pre-print
In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask.  ...  For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system.  ...  Simultaneous Speech Translation Track In this section, we describe our submission to the Simultaneous Speech Translation (SST) track.  ... 
arXiv:2005.11861v1 fatcat:a6l5yzfmyfghbik64ajjfu5w5e

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory [article]

Zhijie Lin, Zhou Zhao, Haoyuan Li, Jinglin Liu, Meng Zhang, Xingshan Zeng, Xiaofei He
2021 arXiv   pre-print
To breakthrough this constraint, we study the task of simultaneous lip reading and devise SimulLR, a simultaneous lip Reading transducer with attention-guided adaptive memory from three aspects: (1) To  ...  The experiments show that the SimulLR achieves the translation speedup 9.10× compared with the state-of-the-art non-simultaneous methods, and also obtains competitive results, which indicates the effectiveness  ...  recognition (ASR) [22, 31, 33, 40] , speech to text translation [6, 11, 29, 32] , speech to speech translation [35] and so on.  ... 
arXiv:2108.13630v1 fatcat:mvhi6fatanekfp4oqe52qphzyu

Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition [article]

Priyabrata Karmakar, Shyh Wei Teng, Guojun Lu
2021 arXiv   pre-print
The paper focuses on the development and evolution of attention models for offline and streaming speech recognition within recurrent neural network- and Transformer- based architectures.  ...  In this survey paper, a comprehensive review of the different attention models used in developing automatic speech recognition systems is provided.  ...  A similar method was proposed in augmented memory Transformer [61] where an augmented memory bank is included apart from partitioning the input speech sequence.  ... 
arXiv:2102.07259v1 fatcat:gxtylzrfwzaofeibk7gvzq42le

Streaming cascade-based speech translation leveraged by a direct segmentation model

Javier Iranzo-Sánchez, Javier Jorge, Pau Baquero-Arnal, Joan Albert Silvestre-Cerdà, Adrià Giménez, Jorge Civera, Albert Sanchis, Alfons Juan
2021 Neural Networks  
The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system.  ...  Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance.  ...  Simultaneous Machine Translation Offline and simultaneous MT systems were trained for each of the translation directions using the Transformer BASE configuration [8] implemented with the Fairseq toolkit  ... 
doi:10.1016/j.neunet.2021.05.013 pmid:34082286 fatcat:ajjdmz4sqrgwngg5jsse4mtuma

Progress in Machine Translation

Haifeng Wang, Hua Wu, Zhongjun He, Liang Huang, Kenneth Ward Church
2021 Engineering  
We then introduce NMT in more detail, including the basic framework and the current dominant framework, Transformer, as well as multilingual translation models to deal with the data sparseness problem.  ...  In addition, we introduce cutting-edge simultaneous translation methods that achieve a balance between translation quality and latency.  ...  Simultaneous S2S translation pipeline A typical cascaded ST system consists of an ASR system that transcribes the source speech into source streaming text, an MT system that performs the translation from  ... 
doi:10.1016/j.eng.2021.03.023 fatcat:opodsmef4jff7idanwkxtte7ca

DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting [article]

Hao Xiong, Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu and Haifeng Wang
2019 arXiv   pre-print
This model allows to constantly read streaming text from the Automatic Speech Recognition (ASR) model and simultaneously determine the boundaries of Information Units (IUs) one after another.  ...  The detected IU is then translated into a fluent translation with two simple yet effective decoding strategies: partial decoding and context-aware decoding.  ...  We also thank 16 and 17 for contributing their speech corpora.  ... 
arXiv:1907.12984v2 fatcat:27qbcuuhrzftdo6yvj2lhgp7xq

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans [article]

Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li (+3 others)
2020 arXiv   pre-print
Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by incorporating transformer, advanced data augmentation, and conformer  ...  Now ESPnet also includes text to speech (TTS), voice conversation (VC), speech translation (ST), and speech enhancement (SE) with support for beamforming, speech separation, denoising, and dereverberation  ...  For example, we will focus on enhancing online/streaming ASR functions from our current RNN-T implementation and further broadening the speech applications by realizing end-to-end speech-to-speech translation  ... 
arXiv:2012.13006v1 fatcat:zvvoohohdzesbh2t4b626xj3dq

NeurST: Neural Speech Translation Toolkit [article]

Chengqi Zhao and Mingxuan Wang and Qianqian Dong and Rong Ye and Lei Li
2021 arXiv   pre-print
NeurST is an open-source toolkit for neural speech translation.  ...  The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products.  ...  Simultaneous Translation NeurST keeps up with the recent progress of simultaneous translation. The models are extended to train with streaming audio or text input.  ... 
arXiv:2012.10018v3 fatcat:awpgt22zqffnrowfay5dapmw3e

Better Morphology Prediction for Better Speech Systems

Dravyansh Sharma, Melissa Wilson, Antoine Bruguier
2019 Interspeech 2019  
We further augment our models with pronunciation information which is typically available in speech systems to further improve the accuracies of the same tasks.  ...  with improved natural language generation and understanding.  ...  Introduction Text-to-speech and speech recognition systems convert a stream of 'words' to an audio stream and vice versa.  ... 
doi:10.21437/interspeech.2019-3207 dblp:conf/interspeech/SharmaWB19 fatcat:qp6jmg2lc5bollpwi6rbdvvefe

Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing

Oscar Mateo Lozano, Kazuhiro Otsuka
2008 Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing  
Stream processing is a relatively new computing paradigm that permits the expression and execution of data-parallel algorithms with great efficiency and minimum effort.  ...  Using Stream Processors for performing the computations as well as efficient Sparse-Template-based particle filtering allows us to achieve real-time processing even when tracking multiple objects simultaneously  ...  Results of the simultaneous tracking of four faces.  ... 
doi:10.1109/icassp.2008.4517709 dblp:conf/icassp/LozanoO08 fatcat:dsanckps5feljprjuvraydc44m

A multi-modal architecture for human robot communication

Arjun K. Arumbakkam, Taizo Yoshikawa, Behzad Dariush, Kikuo Fujimura
2010 2010 10th IEEE-RAS International Conference on Humanoid Robots  
The architecture also enables fine motion control through human speech commands processed by a dedicated speech processing system.  ...  The compliant and low gain tracking performed by this framework renders the system physically safe and therefore friendly to humans interacting with the robot.  ...  based servo level controller, a torque-to-position transformer and an underlying socket and shared memory based software framework for relaying data to and from the robot.  ... 
doi:10.1109/ichr.2010.5686337 dblp:conf/humanoids/ArumbakkamYDF10 fatcat:viw55v7345egfm752yyck3omqy
« Previous Showing results 1 — 15 out of 7,563 results