2,616 Hits in 4.3 sec

Bilingual Terminology Extraction from Comparable E-Commerce Corpora [article]

Hao Jia, Shuqin Gu, Yuqi Zhang, Xiangyu Duan
2022 arXiv   pre-print
In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data.  ...  Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data.  ...  Bilingual Terminology Extraction from Parallel or Comparable Corpora Several influential approaches [7] , [9] , [10] , [17] have been proposed to extract bilingual terminology from parallel corpus  ... 
arXiv:2104.07398v2 fatcat:ieydoa72arfz7cazt53bzw5zoe

Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora

Amir Hazem, Emmanuel Morin
2017 International Joint Conference on Natural Language Processing  
Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains.  ...  This aspect penalizes the performance of distributionalbased approaches, which is closely related to the reliability of word's cooccurrence counts extracted from comparable corpora.  ...  Acknowledgments The research leading to these results has received funding from the French National Research Agency under grant ANR-17-CE23-0001 ADDICTE (Distributional analysis in specialized domain).  ... 
dblp:conf/ijcnlp/HazemM17 fatcat:zvvoz7fexrhq5gulraqecddoau

Disambiguation of single noun translations extracted from bilingual comparable corpora

Hirosi Nakagawa
2001 Terminology  
In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations  ...  At the first stage, candidate terms are extracted from a Japanese and English corpus, respectively, and ranked according to their importance as terms.  ...  Basic dictionary for translations It is almost impossible to acquire lexical translation from bilingual comparable corpora from scratch.  ... 
doi:10.1075/term.7.1.06nak fatcat:6r2du3a6ovgync7lppicovyely

Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval

Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura
2003 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03  
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval  ...  A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their linguistics-based  ...  Conclusion We investigated the approach of extracting bilingual terminology from comparable corpora in order to enrich existing bilingual lexicons and enhance CLIR.  ... 
doi:10.3115/1075178.1075201 dblp:conf/acl/SadatYU03 fatcat:3ffo3vcownhq7kb2qw3x22e5r4

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora.  ...  This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval.  ...  We propose a novel approach to learning from comparable corpora and extracting a bilingual lexicon.  ... 
doi:10.1145/860435.860519 dblp:conf/sigir/SadatYU03 fatcat:xydcdi2i5nhtxogt5qyvtx5muq

Termhood-Based Comparability Metrics of Comparable Corpus in Special Domain [chapter]

Sa Liu, Chengzhi Zhang
2013 Lecture Notes in Computer Science  
A new comparability, namely, termhood-based metrics, oriented to the task of bilingual terminology extraction, is proposed in this paper.  ...  Comparable corpora, that the subcorpora are not translations of each other, can be easily obtained from web.  ...  Therefore, our experiment is designed to extract bilingual terminology from three corpora with different comparability.  ... 
doi:10.1007/978-3-642-36337-5_15 fatcat:wcflfhqp4rdd3bftsu6chzmhny

Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase

Sigita Rackevičienė, Liudmila Mockienė, Andrius Utka, Aivaras Rokas
2021 Studies About Languages  
The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and  ...  standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable  ...  Besides, BiTE from comparable corpora, used in addition to BiTE from parallel corpora, allows extracting and comparing terminology formed and used in various settings.  ... 
doi:10.5755/j01.sal.1.39.29156 fatcat:mrmv4n4qnfbk3kswehn7b34jyq

Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction

Emmanuel Morin, Amir Hazem
2014 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced.  ...  Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora on the quality of bilingual terminology extraction through different experiments.  ...  The bilingual lexicon extraction task from comparable corpora inherits this filiation.  ... 
doi:10.3115/v1/p14-1121 dblp:conf/acl/MorinH14 fatcat:nojrs7ztgfhc5lmal7lltglmpa

French-English Terminology Extraction from Comparable Corpora [chapter]

Béatrice Daille, Emmanuel Morin
2005 Lecture Notes in Computer Science  
This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain.  ...  After explaining the difficulties involved in aligning MWTs and specifying our approach, we show the adopted process for bilingual terminology extraction and the resources used in our experiments.  ...  This work has also benefited from his comments.  ... 
doi:10.1007/11562214_62 fatcat:cmbdysk43nfz7bb6lh33po7rme

Using ParaConc to extract bilingual terminology from parallel corpora: A case of English and Ndebele

Ketiwe Ndhlovu
2016 Literator  
This article explores how parallel corpora can be interrogated using a bilingual concordancer (ParaConc) to extract bilingual terminology that can be used to create specialised bilingual dictionaries.  ...  An English–Ndebele Parallel Corpus was used as a resource and through ParaConc, an alphabetic list was compiled from which headwords and possible translations were sought.  ...  Morin and Prochasson (2011) extracted bilingual lexicon from comparable corpora enhanced with parallel corpora.  ... 
doi:10.4102/lit.v37i2.1278 fatcat:2o37ieflwzbszjrvwzpwznicti

QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora [chapter]

Amir Hazem, Emmanuel Morin
2012 Lecture Notes in Computer Science  
Comparable corpora are the main alternative to the use of parallel corpora to extract bilingual lexicons.  ...  This idea, which has been already used in machine translation task for more than a decade, is not straightforward for the task of bilingual lexicon extraction from specific-domain comparable corpora.  ...  Concordance bilinguE libre pour l'Aide la traductioN) of the General Delegation for the French Language and in languages of France.  ... 
doi:10.1007/978-3-642-28601-8_8 fatcat:5aead6vghfgu5mkwqovv76hmsy

Building a terminological tool as implementation instrument of the sustainable built environment

Oana Tatu, Transilvania University of Brasov, Romania
2022 Bulletin of the Transilvania University of Brașov. Series IV, Philology & Cultural Studies  
As for the Romanian-specific terminology employed by decision-makers, there are critical inconsistencies pertaining to mistranslations or fluctuating translations of standardized English terms.  ...  This article outlines the purpose and objectives of the envisaged project, highlights its interdisciplinarity, and displays the steps to be taken in developing a terminological tool that is meant to be  ...  Regarding corpora as terminological resources, starting from the 80s, there have been various attempts to extract terms for bilingual glossaries from parallel corpora, (Chen 1993, Kay and Ro ¨scheisen  ... 
doi:10.31926/but.pcs.2021. fatcat:7fvhhvfjvbcnfbote5vwqkjoiu

Acquisition of Medical Terminology for Ukrainian from Parallel Corpora and Wikipedia

Thierry Hamon, Natalia Grabar
2015 International Conference on Terminology and Artificial Intelligence  
We propose to exploit various corpora available in several languages in order to build bilingual and trilingual terminologies.  ...  The increasing availability of parallel bilingual corpora and of automatic methods and tools for their processing makes it possible to build linguistic and terminological resources for low-resourced languages  ...  Extraction of bilingual terminology from the MedlinePlus corpus The use of the first method of transfer (Transfer 1) allows to extract 436 Ukrainian terms with a high precision unsurprisingly (0.966).  ... 
dblp:conf/tia/HamonG15 fatcat:7xdcmgofzjgjfhmtjxofxelrxi

Extraction of Bilingual Cognates from Wikipedia [chapter]

Pablo Gamallo, Marcos Garcia
2012 Lecture Notes in Computer Science  
In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora.  ...  The method was applied on Wikipedia to extract a large amount of Portuguese-Spanish bilingual terminological pairs that were not found in existing dictionaries.  ...  Conclusions We have proposed a method to extract new bilingual terminology from Wikipediabased comparable corpora, achieving more than 90% accuracy.  ... 
doi:10.1007/978-3-642-28885-2_7 fatcat:svlpqkzwlvclrlfz4qy752vjl4

Extraction of Bilingual Terminology from a Multilingual Web-based Encyclopedia

Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio
2008 Journal of Information Processing  
bilingual terminology from parallel corpora.  ...  With the demand for bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular.  ...  A lot of research has been conducted on the extraction of bilingual terminology from parallel corpora, bilingual text collections consisting of texts in one language and their translations into another  ... 
doi:10.2197/ipsjjip.16.68 fatcat:b3cqlwumfzabhgj2ybj4sjn7t4
« Previous Showing results 1 — 15 out of 2,616 results