1,098 Hits in 5.8 sec

Construction of Medical Academic English Translation Model Driven by Bilingual Corpus-Based Data

Jinping Liu, Hong Liu, Sheng Bin
2022 Scientific Programming  
The scale of corpus and related technology have a large research space; and how to obtain effective data and knowledge from massive resources, in order to better serve the basic and applied research, is  ...  With the rapid development of information collection technology and natural language processing technology, the construction of English-Chinese bilingual parallel corpus has developed rapidly.  ...  Acknowledgments is study was supported by the Department of Foreign Languages, Xinyang College.  ... 
doi:10.1155/2022/2264235 fatcat:tkptxhgnubbjfcetazzkvvrkhq

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research

Phillippe Langlais
2017 Proceedings of the 10th Workshop on Building and Using Comparable Corpora  
Despite numerous studies devoted to mining parallel material from bilingual data, we have yet to see the resulting technologies wholeheartedly adopted by professional translators and terminologists alike  ...  I argue that this state of affairs is mainly due to two factors: the emphasis published authors put on models (even though data is as important), and the conspicuous lack of concern for actual end-users  ...  Data Organization Large-scale acquisition efforts conducted over the Web involve at some point an effort to distinguish parallel data from comparable or even unrelated data.  ... 
doi:10.18653/v1/w17-2501 dblp:conf/acl-bucc/Langlais17 fatcat:r6z7ztddfrh4jjyhkr7u7eb7ei

The Construction and Application of an English-Chinese Parallel Corpus of Hydrogen Energy

Zheng Li
2021 Converter  
This paper describes the construction and application of the first English-Chinese Parallel Corpus of Hydrogen Energy (PCOHE) in China, including design framework, collecting and processing of corpus materials  ...  , and its applications.Thepaper adopts statistical analysis with the help of corpussoftware to portray the linguistic characteristics of the hydrogen energy texts and study the translation strategy of  ...  Acknowledgements This research was supported by Cooperative Education Program of Higher Education Department of Ministry of Education (Grant No. 202002242018).  ... 
doi:10.17762/converter.320 fatcat:ketbphbvojagtc5uce22jyvtju

Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Hiroshi Echizen-ya, Kenji Araki, Yoshio Momouchi
2006 Information Processing & Management  
Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important.  ...  Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual  ...  Acknowledgement This work was partially supported by Grants from the High-Tech Research Center of Hokkai-Gakuen University, and an academic research grant of Hokkai-Gakuen University.  ... 
doi:10.1016/j.ipm.2005.11.004 fatcat:lg4icrccp5bchopeqhli4l7nby

An overview of NexusLinguarum use cases: Current status and challenges

Sara Carvalho, Ilan Kernerman
2021 Zenodo  
Working Group 4 (WG4) of the NexusLinguarum COST Action – European network for Web-centred linguistic data science (CA18209) – is dedicated to applying and validating the Action's methodologies and technologies  ...  In addition, it describes the cooperation with the other WGs of NexusLinguarum. URL:  ...  Acknowledgement The use case is based on the project "Bilingual automatic terminology extraction" funded by the Research Council of Lithuania (LMTLT, agreement No. P-MIP-20-282).  ... 
doi:10.5281/zenodo.5729078 fatcat:t3z3chvtmjbofdyk72wxzofmku

The case of InterCorp, a multilingual parallel corpus

František Čermák, Alexandr Rosen
2012 International Journal of Corpus Linguistics  
of the language from the perspective of other languages.  ...  This paper introduces InterCorp, a parallel corpus including texts in Czech and 27 other languages, available for online searches via a web interface.  ...  Significant developments also concern the process of acquiring a parallel corpus from the web (Razavian & Vogel 2009 ).  ... 
doi:10.1075/ijcl.17.3.05cer fatcat:75dnbzcfgjaatcehcxrze5ifaa

Tibetan-Chinese Cross Language Text Similarity Calculation Based on LDA Topic Model

Sun Yuan, Zhao Qian
2015 Open Cybernetics and Systemics Journal  
from Wikipedia. (2) Using topic model to make the texts mapped to the feature space of topics. (3) Calculating the similarity of two texts in different language according to the characteristics of the  ...  The method for text similarity calculation based on LDA model reduces the dimensions of text space vector, and enhances the understanding of the text's semantics.  ...  The disadvantage of this method is that the quality of comparable corpora depends on topic model largely. Bilingual topic model generated from parallel data or multilingual aligned documents.  ... 
doi:10.2174/1874110x01509012911 fatcat:uls2svodbrbargd6efninkamzy

Cómo los corpus pueden asistir a los estudiantes de traducción jurídica: la plataforma GENTT TransTools Corpora y Sketch Engine

Anabel Borja Albi
2019 Quaderns de Filologia: Estudis Lingüístics  
En primer lugar, revisaremos la bibliografía sobre experiencias previas de aplicación de corpus textuales a la enseñanza de la traducción jurídica.  ...  Esta aportación analiza la aplicación de corpus a la enseñanza de la traducción jurídica en la educación superior utilizando la plataforma GENTT TransTools Corpora y Sketch Engine.  ...  we can state that the emergence of digital systems, networked communications, machine learning and large-scale data analysis, and the increasing integration of these technologies into translation businesses  ... 
doi:10.7203/qf.24.16297 fatcat:6q62k3svubbhjlhybkizwbfehy

Web-based technical term translation pairs mining for patent document translation

Feiliang Ren, Jingbo Zhu, Huizhen Wang
2010 Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)  
This paper proposes a simple but powerful approach for obtaining technical term translation pairs in patent domain from Web automatically.  ...  Secondly, an extraction algorithm is proposed to extract some key word translation pairs from the returned web pages.  ...  Acknowledgements This paper is supported by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR), and also is supported by the "the Fundamental Research Funds for the Central  ... 
doi:10.1109/nlpke.2010.5587775 dblp:conf/nlpke/RenZW10a fatcat:pdsk7mdkjbhj5hcyixmgt6uuoa

Forty years of working with corpora: from Ibsen to Twitter, and beyond

Knut Hofland, Paul Meurer, Andrew Salway
2013 Bergen Language and Linguistics Studies  
We provide an overview of forty years of work with language corpora by the research group that started in 1972 as the Norwegian Computing Centre for the Humanities.  ...  A brief history highlights major corpora and tools that have been developed in numerous collaborations, including corpora of literature, dialect recordings, learner language, parallel texts, newspaper  ...  Acknowledgements It is with great pleasure that we acknowledge the people, including some long--standing collaborators, who made significant contributions to the work described in this paper: Gisle Andersen  ... 
doi:10.15845/bells.v3i1.371 fatcat:b4273ihrdfdu3b3fce2ln2vob4

The IJS-ELAN Slovene-English Parallel Corpus

Tomaž Erjavec
2002 International Journal of Corpus Linguistics  
The paper presents an annotated parallel Slovene-English corpus developed in the scope of the EU ELAN project.  ...  The corpus contains 1 million words from fifteen recent terminology-rich texts. The corpus is sentence aligned and word-tagged with context disambiguated morphosyntactic descriptions and lemmas.  ...  texts for the corpus.  ... 
doi:10.1075/ijcl.7.1.01erj fatcat:yq5zcecxejbqfjmgsbcqa2ifi4

Resources for Turkish Natural Language Processing: A critical survey [article]

Çağrı Çöltekin, A. Seza Doğruöz, Özlem Çetinoğlu
2022 arXiv   pre-print
This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available.  ...  In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications  ...  Large-scale (unannotated) linguistic data collections Although well-balanced, representative corpora have been at the focus of building corpora in corpus linguistics, opportunistic large collections of  ... 
arXiv:2204.05042v1 fatcat:ei2oz3nwofa63orub6xyqnkcta

Bootstrapping dictionaries for cross-language information retrieval

Kornél Markó, Stefan Schulz, Olena Medelyan, Udo Hahn
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
The bottleneck for dictionary-based cross-language information retrieval is the lack of comprehensive dictionaries, in particular for many different languages.  ...  Lexical and semantic hypotheses are then validated and new ones iteratively generated by making use of co-occurrence patterns of hypothesized translation synonyms in parallel corpora.  ...  Validation Using Parallel Corpora We took advantage of the availability of large parallel corpora in the biomedical domain in order to identify false friends, i.e., similar words in different languages  ... 
doi:10.1145/1076034.1076124 dblp:conf/sigir/MarkoSMH05 fatcat:35aws52ez5hd7di7xl4ik6p35y

Working with corpora in the translation classroom

Ralph Krüger
2012 Studies in Second Language Learning and Teaching  
Starting with a survey of corpus use within corpus-based translation studies, the didactic value of corpora in the translation classroom and their epistemic value in translation teaching and practice will  ...  After a brief discussion of possible Internet research techniques for targeted and quality-focused corpus compilation, the possible use of the Internet itself as a macro-corpus will be elaborated.  ...  However, the dominant research focus of corpus-based translation studies is not on large-scale, methodologically sound ST-TT comparisons, but rather on the investigation of "the nature of translated text  ... 
doi:10.14746/ssllt.2012.2.4.4 fatcat:ldr3mdnrsbdaljdepn665xg2ai

Introduction [chapter]

Peter Spyns
2012 Essential Speech and Language Technology for Dutch  
The thematic priorities for each call were determined in line with the overall STEVIN priorities and the state of their realisation before each call.  ...  A BLARK is defined as the set of basic HLT resources that should be available for both academia and industry [13] .  ...  This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original  ... 
doi:10.1007/978-3-642-30910-6_1 dblp:series/tanlp/Spyns13 fatcat:x3hadalrirbvliitkmzj74xtqi
« Previous Showing results 1 — 15 out of 1,098 results