A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
2018
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
We used these datasets to build our image description translation system by adopting state-of-theart MNMT models. ...
A threeway parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. ...
The authors would like to thank Longyue Wang and Meghan Dowling for providing many good suggestions of improvements, as well as our anonymous reviewers for their valuable comments and feedback. ...
doi:10.18653/v1/w18-3405
dblp:conf/acl-deeplo/ChowdhuryHL18
fatcat:hmav4zatmzgazjx3wzq2nox74i
Domain Adaptation for Medical Text Translation using Web Resources
2014
Proceedings of the Ninth Workshop on Statistical Machine Translation
This paper describes adapting statistical machine translation (SMT) systems to medical domain using in-domain and general-domain data as well as webcrawled in-domain resources. ...
Our systems achieve the second best BLEU scores for Czech-English, fourth for French-English, English-French language pairs and the third best results for reminding pairs. ...
Acknowledgments The authors are grateful to the Science and Technology Development Fund of Macau and the Research Committee of the University of Macau for the funding support for their research, under ...
doi:10.3115/v1/w14-3328
dblp:conf/wmt/0005WWCW14
fatcat:dlokxt6okjaytj3l4hokyhstzu
Neural Machine Translation for Low-Resource Languages: A Survey
[article]
2021
arXiv
pre-print
unavailability of large parallel corpora. ...
While considered as the most widely used solution for Machine Translation, its performance on low-resource language pairs still remains sub-optimal compared to the high-resource counterparts, due to the ...
Parallel Data Mining (bitext mining) from comparable corpora: Comparable corpora refer to text on the same topic that is not direct translations of each other but may contain fragments that are translation ...
arXiv:2106.15115v1
fatcat:4w3jtdd4q5fnjbfznrqq7glxdu
Lynx D2.6 Report on Lynx acquired corpora
2020
Zenodo
Furthermore, the document describes the corpora preparation workflow to be used in the training of Neural MT engines for specific languages and domains. ...
Finally, this document reports on the term extraction process on the compiled corpora. ...
Furthermore, MT systems are source-to-target systems that in typical scenarios (i.e., if we ignore multi-way NMT systems and other multi-task neural network-based systems) translate from one source language ...
doi:10.5281/zenodo.3692591
fatcat:b2canplljncmjnyqgr6ywjndm4
Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation
2012
Computer Science Journal of Moldova
The most adequate type of cross-lingual data is represented by parallel corpora, collection of reciprocal translations. ...
When required parallel data refers to specialized (narrow) domains, the scarcity of data becomes even more acute. ...
comparable corpora in collecting parallel sentences meant for improving translation quality for under resourced languages and/or narrow domains. ...
doaj:05a53f2716f24a9b87820f9505c4adb6
fatcat:nymorko3vze73gyaukjk7sou64
Lynx D2.3 Intermediate report on Lynx acquired corpora
2019
Zenodo
Finally, this document reports on the term extraction process performed so far on the compiled corpora and briefly outlines its further use in the Lynx MT systems. ...
Furthermore, the document describes the corpora preparation workflow to be used in the training of Neural MT engines for specific languages and domains. ...
Furthermore, MT systems are source-to-target systems that in typical scenarios (i.e., if we ignore multi-way NMT systems and other multi-task neural network-based systems) translate from one source language ...
doi:10.5281/zenodo.2655047
fatcat:gr4k7lypwvbu7je4kdhkia7ymu
D5.1 Report on Vocabularies for Interoperable Language Resources and Services
2020
Zenodo
We focus on three main aspects of linguistically analyzed data 1. lexical-conceptual resources, i.e., repositories of terminology, lexical data, translation, and semantics, 2. linguistically annotated ...
This document provides a survey over vocabularies for language resources and services and sketch necessary extensions and the expected contribution of the Prêt-à-LLOD project to their further development ...
information, collocations), ( 2 ) pointers from lexical resources to corpora and other collections of text (attestations), (3) the annotation of corpora and other language resources with lexical information ...
doi:10.5281/zenodo.5744205
fatcat:xfrpsie7zjgjboxi4j265husjy
Asr Domain Adaptation Methods For Low-Resourced Languages: Application To Romanian Language
2012
Zenodo
The second semi-supervised adaptation method regards partB of the in-domain French text and the Romanian partB_GoMTpp text as parallel corpora and uses them to train a domain-specific SMT system. ...
General text corpora acquisition Romanian is a low-resourced language from the point of view of plain text corpora. ...
doi:10.5281/zenodo.52416
fatcat:zhzuhgsmajc75eqrnaxw7sysgm
Multilingual Projection for Parsing Truly Low-Resource Languages
2016
Transactions of the Association for Computational Linguistics
All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. ...
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. ...
Acknowledgements We thank the editors and the anonymous reviewers for their valuable comments. This research is funded by the ERC Starting Grant LOWLANDS (#313695). ...
doi:10.1162/tacl_a_00100
fatcat:zulajoluuzci3foa5zzmv6w45y
Survey of Low-Resource Machine Translation
[article]
2022
arXiv
pre-print
There are currently around 7000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. ...
We present a survey covering the state of the art in low-resource machine translation research. ...
Not only is such data likely to be in a very different domain from the text that we would like to translate, but such large-scale multilingual automatically extracted corpora are often of poor quality ...
arXiv:2109.00486v3
fatcat:5wof74vjy5gptcl5ornkd5j4ku
Resources for Turkish Natural Language Processing: A critical survey
[article]
2022
arXiv
pre-print
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. ...
In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications ...
for translation between Turkic languages were released. 9 Except for small samples in Apertium repositories (Forcada et al., 2011) , the corpora build with large-scale parallel text collections (e.g. ...
arXiv:2204.05042v1
fatcat:ei2oz3nwofa63orub6xyqnkcta
Efficient Use of Resources for Statistical Machine Translation
2017
DESIDOC Journal of Library & Information Technology
Success of data driven machine translation systems is governed by the volume of parallel data on which these systems are being modelled. ...
The results achieved are promising and set example for other morphological rich languages to optimise the resources to improve the performance of the translation system. </span></p></div></div></div> ...
INTRODUCTION Machine translation technology can play a vital role in any domain where multi-lingual content is used. Library and information science also deal with multi-lingual contents. ...
doi:10.14429/djlit.37.11420
fatcat:jdvkoy67tzfnbneifyivodgara
Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora
2018
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual ...
Subsequently, we have showed that using the harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems for the low-resource language pairs: English--Hindi and English--Tamil, when ...
Recent crowd-sourcing efforts and workshops on machine translation have resulted in small amounts of parallel texts for building viable machine translation systems for low-resource pairs [16] . ...
doi:10.18653/v1/n18-4016
dblp:conf/naacl/RameshS18
fatcat:emmblyhcwbahvczghp4tm4cy3a
Enabling Medical Translation for Low-Resource Languages
[article]
2016
arXiv
pre-print
As this is a low-resource language pair, especially for speech and for the medical domain, our initial focus has been on gathering suitable training data from various sources. ...
In particular, we present the first steps towards the development of a real-world Hindi-English machine translation system for doctor-patient communication. ...
Acknowledgments The authors would like to thank Naila Khalisha and Manisha Bansal for their contributions towards the project. ...
arXiv:1610.02633v1
fatcat:b37f46m4mvdn5cf5bavn2qranq
SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
2014
Speech Communication
We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced ...
An in-depth analysis, to explain why and how the machine translated text improves the performance of the domain-specific ASR, is also made at the end of this paper. ...
This is basically the problem we are trying to solve in this paper: is there an inexpensive way to create domain-specific text corpora and eventually domain-specific ASR systems for under-resourced languages ...
doi:10.1016/j.specom.2013.05.003
fatcat:dyhcpkrhh5bxjmlipklgx6dayy
« Previous
Showing results 1 — 15 out of 4,386 results