Filters








4,386 Hits in 8.6 sec

Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data

Koel Dutta Chowdhury, Mohammed Hasanuzzaman, Qun Liu
2018 Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP  
We used these datasets to build our image description translation system by adopting state-of-theart MNMT models.  ...  A threeway parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features.  ...  The authors would like to thank Longyue Wang and Meghan Dowling for providing many good suggestions of improvements, as well as our anonymous reviewers for their valuable comments and feedback.  ... 
doi:10.18653/v1/w18-3405 dblp:conf/acl-deeplo/ChowdhuryHL18 fatcat:hmav4zatmzgazjx3wzq2nox74i

Domain Adaptation for Medical Text Translation using Web Resources

Yi Lu, Longyue Wang, Derek F. Wong, Lidia S. Chao, Yiming Wang
2014 Proceedings of the Ninth Workshop on Statistical Machine Translation  
This paper describes adapting statistical machine translation (SMT) systems to medical domain using in-domain and general-domain data as well as webcrawled in-domain resources.  ...  Our systems achieve the second best BLEU scores for Czech-English, fourth for French-English, English-French language pairs and the third best results for reminding pairs.  ...  Acknowledgments The authors are grateful to the Science and Technology Development Fund of Macau and the Research Committee of the University of Macau for the funding support for their research, under  ... 
doi:10.3115/v1/w14-3328 dblp:conf/wmt/0005WWCW14 fatcat:dlokxt6okjaytj3l4hokyhstzu

Neural Machine Translation for Low-Resource Languages: A Survey [article]

Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, Rishemjit Kaur
2021 arXiv   pre-print
unavailability of large parallel corpora.  ...  While considered as the most widely used solution for Machine Translation, its performance on low-resource language pairs still remains sub-optimal compared to the high-resource counterparts, due to the  ...  Parallel Data Mining (bitext mining) from comparable corpora: Comparable corpora refer to text on the same topic that is not direct translations of each other but may contain fragments that are translation  ... 
arXiv:2106.15115v1 fatcat:4w3jtdd4q5fnjbfznrqq7glxdu

Lynx D2.6 Report on Lynx acquired corpora

Ēriks Ajausks, Christian Sageder, Andis Lagzdiņš, Víctor Rodríguez-Doncel
2020 Zenodo  
Furthermore, the document describes the corpora preparation workflow to be used in the training of Neural MT engines for specific languages and domains.  ...  Finally, this document reports on the term extraction process on the compiled corpora.  ...  Furthermore, MT systems are source-to-target systems that in typical scenarios (i.e., if we ignore multi-way NMT systems and other multi-task neural network-based systems) translate from one source language  ... 
doi:10.5281/zenodo.3692591 fatcat:b2canplljncmjnyqgr6ywjndm4

Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation

Dan Tufis
2012 Computer Science Journal of Moldova  
The most adequate type of cross-lingual data is represented by parallel corpora, collection of reciprocal translations.  ...  When required parallel data refers to specialized (narrow) domains, the scarcity of data becomes even more acute.  ...  comparable corpora in collecting parallel sentences meant for improving translation quality for under resourced languages and/or narrow domains.  ... 
doaj:05a53f2716f24a9b87820f9505c4adb6 fatcat:nymorko3vze73gyaukjk7sou64

Lynx D2.3 Intermediate report on Lynx acquired corpora

Ēriks Ajausks, Víctor Mireles-Chaves, Christian Sageder, Andis Lagzdiņš, Elena Montiel-Ponsoda
2019 Zenodo  
Finally, this document reports on the term extraction process performed so far on the compiled corpora and briefly outlines its further use in the Lynx MT systems.  ...  Furthermore, the document describes the corpora preparation workflow to be used in the training of Neural MT engines for specific languages and domains.  ...  Furthermore, MT systems are source-to-target systems that in typical scenarios (i.e., if we ignore multi-way NMT systems and other multi-task neural network-based systems) translate from one source language  ... 
doi:10.5281/zenodo.2655047 fatcat:gr4k7lypwvbu7je4kdhkia7ymu

D5.1 Report on Vocabularies for Interoperable Language Resources and Services

Christian Chiarcos, Philipp Cimiano, Julia Bosque-Gil, Thierry Declerck, Christian Fäth, Jorge Gracia, Maxim Ionov, John P. McCrae, Elena Montiel-Ponsoda, Maria Pia di Buono, Roser Saurí, Fernando Bobillo (+1 others)
2020 Zenodo  
We focus on three main aspects of linguistically analyzed data 1. lexical-conceptual resources, i.e., repositories of terminology, lexical data, translation, and semantics, 2. linguistically annotated  ...  This document provides a survey over vocabularies for language resources and services and sketch necessary extensions and the expected contribution of the Prêt-à-LLOD project to their further development  ...  information, collocations), ( 2 ) pointers from lexical resources to corpora and other collections of text (attestations), (3) the annotation of corpora and other language resources with lexical information  ... 
doi:10.5281/zenodo.5744205 fatcat:xfrpsie7zjgjboxi4j265husjy

Asr Domain Adaptation Methods For Low-Resourced Languages: Application To Romanian Language

L. Besacier, Corneliu Burileanu, Andi Buzo, Horia Cucu
2012 Zenodo  
The second semi-supervised adaptation method regards partB of the in-domain French text and the Romanian partB_GoMTpp text as parallel corpora and uses them to train a domain-specific SMT system.  ...  General text corpora acquisition Romanian is a low-resourced language from the point of view of plain text corpora.  ... 
doi:10.5281/zenodo.52416 fatcat:zhzuhgsmajc75eqrnaxw7sysgm

Multilingual Projection for Parsing Truly Low-Resource Languages

Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter, Anders Søgaard
2016 Transactions of the Association for Computational Linguistics  
All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages.  ...  We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages.  ...  Acknowledgements We thank the editors and the anonymous reviewers for their valuable comments. This research is funded by the ERC Starting Grant LOWLANDS (#313695).  ... 
doi:10.1162/tacl_a_00100 fatcat:zulajoluuzci3foa5zzmv6w45y

Survey of Low-Resource Machine Translation [article]

Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, Alexandra Birch
2022 arXiv   pre-print
There are currently around 7000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models.  ...  We present a survey covering the state of the art in low-resource machine translation research.  ...  Not only is such data likely to be in a very different domain from the text that we would like to translate, but such large-scale multilingual automatically extracted corpora are often of poor quality  ... 
arXiv:2109.00486v3 fatcat:5wof74vjy5gptcl5ornkd5j4ku

Resources for Turkish Natural Language Processing: A critical survey [article]

Çağrı Çöltekin, A. Seza Doğruöz, Özlem Çetinoğlu
2022 arXiv   pre-print
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available.  ...  In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications  ...  for translation between Turkic languages were released. 9 Except for small samples in Apertium repositories (Forcada et al., 2011) , the corpora build with large-scale parallel text collections (e.g.  ... 
arXiv:2204.05042v1 fatcat:ei2oz3nwofa63orub6xyqnkcta

Efficient Use of Resources for Statistical Machine Translation

Karunesh Kumar Arora, Shyam Sunder Agrawal
2017 DESIDOC Journal of Library & Information Technology  
Success of data driven machine translation systems is governed by the volume of parallel data on which these systems are being modelled.  ...  The results achieved are promising and set example for other morphological rich languages to optimise the resources to improve the performance of the translation system. </span></p></div></div></div>  ...  INTRODUCTION Machine translation technology can play a vital role in any domain where multi-lingual content is used. Library and information science also deal with multi-lingual contents.  ... 
doi:10.14429/djlit.37.11420 fatcat:jdvkoy67tzfnbneifyivodgara

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

Sree Harsha Ramesh, Krishna Prasad Sankaranarayanan
2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop  
Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual  ...  Subsequently, we have showed that using the harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems for the low-resource language pairs: English--Hindi and English--Tamil, when  ...  Recent crowd-sourcing efforts and workshops on machine translation have resulted in small amounts of parallel texts for building viable machine translation systems for low-resource pairs [16] .  ... 
doi:10.18653/v1/n18-4016 dblp:conf/naacl/RameshS18 fatcat:emmblyhcwbahvczghp4tm4cy3a

Enabling Medical Translation for Low-Resource Languages [article]

Ahmad Musleh and Nadir Durrani and Irina Temnikova and Preslav Nakov and Stephan Vogel and Osama Alsaad
2016 arXiv   pre-print
As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources.  ...  In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication.  ...  Acknowledgments The authors would like to thank Naila Khalisha and Manisha Bansal for their contributions towards the project.  ... 
arXiv:1610.02633v1 fatcat:b37f46m4mvdn5cf5bavn2qranq

SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

Horia Cucu, Andi Buzo, Laurent Besacier, Corneliu Burileanu
2014 Speech Communication  
We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced  ...  An in-depth analysis, to explain why and how the machine translated text improves the performance of the domain-specific ASR, is also made at the end of this paper.  ...  This is basically the problem we are trying to solve in this paper: is there an inexpensive way to create domain-specific text corpora and eventually domain-specific ASR systems for under-resourced languages  ... 
doi:10.1016/j.specom.2013.05.003 fatcat:dyhcpkrhh5bxjmlipklgx6dayy
« Previous Showing results 1 — 15 out of 4,386 results