Filters








971 Hits in 5.9 sec

Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings [article]

Sosuke Nishikawa, Ryokan Ri, Yoshimasa Tsuruoka
2021 arXiv   pre-print
Unsupervised cross-lingual word embedding (CLWE) methods learn a linear transformation matrix that maps two monolingual embedding spaces that are separately trained with monolingual corpora.  ...  We show that our approach outperforms other alternative approaches given the same amount of data, and, through detailed analysis, we show that data augmentation with the pseudo data from unsupervised machine  ...  Having done that, we simply concatenate the machine-translated corpus with the original training corpus, and learn monolingual word embeddings independently for each language.  ... 
arXiv:2006.00262v3 fatcat:u56cphwbkvccvmrngl56lhiwoq

Unsupervised Cross-Lingual Representation Learning

Sebastian Ruder, Anders Søgaard, Ivan Vulić
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts  
Such approaches to cross-lingual learning learn mapping functions between pretrained monolingual word embedding spaces; this is in contrast with approaches based on joint learning, data augmentation, or  ...  Familiarity with standard NLP tasks such as machine translation. .  ...  Such approaches to cross-lingual learning learn mapping functions between pretrained monolingual word embedding spaces; this is in contrast with approaches based on joint learning, data augmentation, or  ... 
doi:10.18653/v1/p19-4007 dblp:conf/acl/RuderSV19 fatcat:khz7rqq3kzaojjssfdvkiqv3ma

Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping

Tianyi Xu, Ozge Ilkim Ozbek, Shannon Marks, Sri Korrapati, Benyamin Ahmadnia
2020 American Journal of Artificial Intelligence  
These results confirm that the Unsupervised Learning approach is still a reliable learning-based translation technique for Spanish-Turkish low-resource NMT.  ...  The quality of data-driven Machine Translation (MT) strongly depends on the quantity as well as the quality of the training dataset.  ...  Mislove (Tulane University of Louisiana, USA) for all his unconditional support.  ... 
doi:10.11648/j.ajai.20200402.11 fatcat:zigslsun4rdd5cx7ksfjscxueq

Handling Syntactic Divergence in Low-resource Machine Translation [article]

Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig
2019 arXiv   pre-print
Data augmentation methods such as back-translation make it possible to use monolingual data to help alleviate these issues, but back-translation itself fails in extreme low-resource scenarios, especially  ...  Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.  ...  The authors would like to thank Shruti Rijhwani and Hiroaki Hayashi for their help when preparing the data sets.  ... 
arXiv:1909.00040v1 fatcat:6khmxw2itbhgviyvdnjo22gfd4

Handling Syntactic Divergence in Low-resource Machine Translation

Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Data augmentation methods such as back-translation make it possible to use monolingual data to help alleviate these issues, but back-translation itself fails in extreme low-resource scenarios, especially  ...  Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.  ...  The authors would like to thank Shruti Rijhwani and Hiroaki Hayashi for their help when preparing the data sets.  ... 
doi:10.18653/v1/d19-1143 dblp:conf/emnlp/ZhouMHN19 fatcat:7izfiyflaje5vjwqrzem364c5m

Generalized Data Augmentation for Low-Resource Translation

Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
First, we inject LRL words into HRL sentences through an induced bilingual dictionary. Second, we further edit these modified sentences using a modified unsupervised machine translation framework.  ...  In this paper, we propose a general framework for data augmentation in low-resource machine translation that not only uses target-side monolingual data, but also pivots through a related highresource language  ...  Acknowledgements The authors thank Junjie Hu and Xinyi Wang for discussions on the paper.  ... 
doi:10.18653/v1/p19-1579 dblp:conf/acl/XiaKAN19 fatcat:pmfp5gvf7fajvht2qdn5kgpd3m

Machine Translation With Weakly Paired Documents

Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao QIN, Jianhuang Lai, Tie-Yan Liu
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
Recent works explore the possibility of unsupervised machine translation with monolingual data only, leading to much lower accuracy compared with the supervised one.  ...  the topic consistency of two weakly paired documents and learn the sentence translation model by constraining the word distribution-level alignments.  ...  This embedding further becomes one of the key components for unsupervised machine translation.  ... 
doi:10.18653/v1/d19-1446 dblp:conf/emnlp/WuZHGQLL19 fatcat:c6qi6d2ovbbhrnd7klcb6yld74

Improving Multilingual Neural Machine Translation For Low-Resource Languages: French,English - Vietnamese [article]

Thi-Vinh Ngo, Phuong-Thai Nguyen, Thanh-Le Ha, Khac-Quy Dinh, Le-Minh Nguyen
2021 arXiv   pre-print
The first strategy is about dynamical learning word similarity of tokens in the shared space among source languages while another one attempts to augment the translation ability of rare words through updating  ...  Besides, we leverage monolingual data for multilingual MT systems to increase the amount of synthetic parallel corpora while dealing with the data sparsity problem.  ...  Augmenting Rare Word Translation Learning multilingual word similarity We assume that a rare word or rare token (which has a low frequency in the training data) from one source language may be similar  ... 
arXiv:2012.08743v2 fatcat:un6v4f72evb2nlodruvotawyeq

Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages [article]

Dana Ruiter, Dietrich Klakow, Josef van Genabith, Cristina España-Bonet
2021 arXiv   pre-print
We further show that the combination of multilingual denoising autoencoding, SSNMT with backtranslation and bilingual finetuning enables us to learn machine translation even for distant language pairs  ...  To address this, unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-translation and noising, while self-supervised  ...  The authors are responsible for the content of this publication. 16 https://github.com/facebookresearch/LASER 17 https://github.com/ruitedk6/comparableNMT  ... 
arXiv:2107.08772v1 fatcat:offywtggwrb6dgy5eh3o7p642u

Unsupervised Clinical Language Translation [article]

Wei-Hung Weng, Yu-An Chung, Peter Szolovits
2019 arXiv   pre-print
We show that a framework using representation learning, bilingual dictionary induction and statistical machine translation yields the best precision at 10 of 0.827 on professional-to-consumer word translation  ...  In contrast, we approach the clinical word and sentence translation problem in a completely unsupervised manner.  ...  Bilingual Dictionary Induction for Word Translation Unsupervised BDI algorithms can be applied to learn a mapping dictionary for alignment of embedding spaces.  ... 
arXiv:1902.01177v2 fatcat:3iky3v4eejfjto3licidzebolu

Unsupervised Clinical Language Translation

Wei-Hung Weng, Yu-An Chung, Peter Szolovits
2019 Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD '19  
We show that a framework using representation learning, bilingual dictionary induction and statistical machine translation yields the best precision at 10 of 0.827 on professional-to-consumer word translation  ...  CCS CONCEPTS • Computing methodologies → Machine translation; Unsupervised learning; Learning latent representations; • Applied computing → Consumer health; Health informatics.  ...  Tristan Naumann, Matthew McDermott, the MIT Clinical Decision Making Group, and all evaluators for their helpful discussions.  ... 
doi:10.1145/3292500.3330710 dblp:conf/kdd/WengCS19 fatcat:lxknatnjebeybdildsg6kerfd4

Neural Machine Translation for Low-Resource Languages: A Survey [article]

Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, Rishemjit Kaur
2021 arXiv   pre-print
While considered as the most widely used solution for Machine Translation, its performance on low-resource language pairs still remains sub-optimal compared to the high-resource counterparts, due to the  ...  Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase.  ...  One such technique is to leverage the available bilingual dictionaries by initialising the model with bilingual word embedding [9, 42, 100] .  ... 
arXiv:2106.15115v1 fatcat:4w3jtdd4q5fnjbfznrqq7glxdu

Vector Space Models for Phrase-based Machine Translation

Tamer Alkhouli, Andreas Guta, Hermann Ney
2014 Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation  
VSMs are models based on continuous word representations embedded in a vector space. We exploit word vectors to augment the phrase table with new inferred phrase pairs.  ...  This helps reduce out-of-vocabulary (OOV) words. In addition, we present a simple way to learn bilingually-constrained phrase vectors.  ...  Zou et al. (2013) learn bilingual word embeddings by designing an objective function that combines unsupervised training with bilingual constraints based on word alignments.  ... 
doi:10.3115/v1/w14-4001 dblp:conf/ssst/AlkhouliGN14 fatcat:dp2h4ist5vg2ho2p3mei2m2tpm

Unsupervised Sentiment Analysis for Code-mixed Data [article]

Siddharth Yadav, Tanmoy Chakraborty
2020 arXiv   pre-print
In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis  ...  Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score.  ...  ACKNOWLEDGMENTS We're grateful to Divam Gupta for all his guidance throughout this project.  ... 
arXiv:2001.11384v1 fatcat:2ec2z2ghn5bidblkpyvnwjyps4

Towards Unsupervised Speech-to-Text Translation [article]

Yu-An Chung and Wei-Hung Weng and Schrasing Tong and James Glass
2018 arXiv   pre-print
For unseen source speech utterances, the system first performs word-by-word translation on each speech segment in the utterance.  ...  training, making it especially applicable to language pairs with very few or even zero bilingual resources.  ...  The learned cross-modal bilingual dictionary, as we will show in this paper, is capable of performing word-by-word translation, with the difference being that the input, instead of text, is a speech segment  ... 
arXiv:1811.01307v1 fatcat:67wrfk45tjbavlv5dt2kjyxvoi
« Previous Showing results 1 — 15 out of 971 results