295 Hits in 4.2 sec

Interlingual annotation of parallel text corpora: a new framework for annotation and evaluation

2010 Natural Language Engineering  
The resulting annotated, multilingually-induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems  ...  This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.  ... 
doi:10.1017/s1351324910000070 fatcat:nepwusm6rjdqdglutqntk6fnyq

Word Sense Disambiguation Using English-Spanish Aligned Phrases over Comparable Corpora [article]

David Fernandez-Amoros
2009 arXiv   pre-print
The evaluation of the experiment has been carried out against SemCor.  ...  In this paper we describe a WSD experiment based on bilingual English-Spanish comparable corpora in which individual noun phrases have been identified and aligned with their respective counterparts in  ...  Acknowledgements We are indebted to Julio Gonzalo for coming up with the idea of applying the noun phrases plus the ILI to WSD and for his advice, and to Fernando Lopez Ostenero for his willing assistance  ... 
arXiv:0910.5682v1 fatcat:jgq77vzs6verjawed5g55ysbea

Scalable Construction of High-Quality Web Corpora

Chris Biemann, Felix Bildhauer, Stefan Evert, Dirk Goldhahn, Uwe Quasthoff, Roland Schäfer, Johannes Simon, Leonard Swiezinski, Torsten Zesch
2013 Journal for Language Technology and Computational Linguistics  
Finally, we show how the availability of extremely large, high-quality corpora opens up new directions for research in various fields of linguistics, computational linguistics, and natural language processing  ...  As we are working with web data, controlling the quality of the resulting corpus is an important issue, which we address by showing how corpus statistics and a linguistic evaluation can be used to assess  ...  Acknowledgments The second evaluation study reported in Section 4.2 is based on joint work with Sabine Bartsch.  ... 
dblp:journals/ldvf/BiemannBEGQSSSZ13 fatcat:eciovvcvazewnfuhk7shiksuiy

Building wordnets with multi-word expressions from parallel corpora

Alberto Simões, Xavier Gómez Guinovart
2020 Revista de Procesamiento de Lenguaje Natural (SEPLN)  
In this paper we present a method for enlarging wordnets focusing on multi-word terms and utilising data from parallel corpora. Our approach is validated using the Galician and Portuguese wordnets.  ...  The multi-word candidates obtained in this experiment were manually validated, obtaining a 73.2% accuracy for the Galician language and a 75.5% for the Portuguese language.  ...  and the European Fund for Regional Development (MCIU/AEI/FEDER), and was partially funded by Portuguese National funds (PIDDAC), through the FCT -Fundação para a Ciência e Tecnologia and FCT/MCTES under  ... 
dblp:journals/pdln/SimoesG20 fatcat:o7tkhfap2jcunnzcqludf3ltti

A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora

Mikel Iruskieta, Iria da Cunha, Maite Taboada
2014 Language Resources and Evaluation  
Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts  ...  To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory.  ...  Maite Taboada We would like to thank the anonymous reviewers for their comments and suggestions, Nynke van der Vliet for her feedback on the evaluation method, Esther Miranda for designing the website  ... 
doi:10.1007/s10579-014-9271-6 fatcat:3svtkso6krcuhfwvwtbx7ouq64

Contrastive Pragmatics and Corpora

Karin Aijmer
2020 Contrastive Pragmatics  
Contrastive pragmatics is closely associated with the use of parallel and comparable corpora for studying the similarities and differences between languages.  ...  Parallel corpora have now been extended to more than two languages making them more relevant for typological research, and they can be used to investigate whether there are (discourse) universals across  ...  The particular advantage of parallel corpora is that they allow the researcher to make specific and fine-grained comparisons on the basis of texts which are interlingually comparable.  ... 
doi:10.1163/26660393-12340004 fatcat:7sbhpmloirby5fz3puma4phifq


Наталія Євгенівна Леміш, Ольга Миколаївна Алексєєва, Світлана Павлівна Денисова, Світлана Анатоліївна Матвєєва, Алла Анатоліївна Зернецька
2020 Ìnformacìjnì Tehnologì ì Zasobi Navčannâ  
Linguistic corpora are a state-of-the-art technology that can solve the outlined problem perfectly, for it opens a broad variety of practical and theoretical research options, and at the same time it is  ...  Moreover, parallel corpora provide ready solutions for the choice of translation models in certain conditions.  ...  Construction of such a corpus involves several stages, including the compulsory markup and annotation / tagging of parallel texts: meta-textual, structural, and linguistic.  ... 
doi:10.33407/itlt.v79i5.3626 fatcat:ci24dtonqzdolkkv2hmlylzu74

Translation Studies and Representative Corpora: Establishing Links between Translation Corpora, Theoretical/Descriptive Categories and a Conception of the Object of Study

Sandra Halverson
1998 Meta : Journal des traducteurs  
As a result, the question of comparable versus parallel corpora of various types will not be addressed (see Granger 1994 and Baker 1995 for discussion of varying corpus types and differences in terminology  ...  Achieving such an end would imply the enrichment of the discipline as a whole in the direction of new means of testing existing theories, and new types of theoretical questions.  ... 
doi:10.7202/003000ar fatcat:q72iptmnynbjvlgcqpatl5je3i

Genre and Register in Comparable Corpora: An English/Spanish Contrastive Analysis

Belén López Arroyo, Roda P. Roberts
2017 Meta : Journal des traducteurs  
traducción inglés-español (ACTRES): Aplicaciones lingüísticas para la internacionalización de la industria de transformación agroalimentaria (LE-227413), supported financially by Junta de Castilla y León and  ...  This methodological adjustment also allowed for more parallelism between the English and Spanish texts and hence for more accurate interlingual comparison of the wine tasting notes.  ...  The need for manual annotation of the corpora explains why the corpora are not very large.  ... 
doi:10.7202/1040469ar fatcat:k2m6bpvh4zhtjj5wxs5m5gpyty

Word Sense Disambiguation Using Wikipedia [chapter]

Bharath Dandala, Rada Mihalcea, Razvan Bunescu
2013 The People's Web Meets NLP  
We present three approaches to word sense disambiguation that use Wikipedia as a source of sense annotations.  ...  Experiments on four languages confirm that the Wikipedia sense annotations are reliable and can be used to construct accurate monolingual sense classifiers.  ...  Acknowledgments This material is based in part upon work supported by the National Science Foundation IIS awards #1018613 and #1018590 and CAREER award #0747340.  ... 
doi:10.1007/978-3-642-35085-6_9 dblp:series/tanlp/DandalaMB13 fatcat:pchga2qz3vgg5jskyhkjplnlvi

Multilingual language resources and interoperability

Andreas Witt, Ulrich Heid, Felix Sasaki, Gilles Sérasset
2009 Language Resources and Evaluation  
This article introduces the topic of "Multilingual Language Resources and Interoperability". We start with a taxonomy and parameters for classifying language resources.  ...  Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.  ...  European attempts to design a representation and query system for multiply annotated text and speech corpora in parallel.  ... 
doi:10.1007/s10579-009-9088-x fatcat:473iagvv4nbq3ehoifeupwwcaa

MATAr: Morphology-based Tagger for Arabic

Fadi A. Zaraket, Ameen Jaber
2013 2013 ACS International Conference on Computer Systems and Applications (AICCSA)  
The annotation tags serve for training, validation, and evaluation.  ...  In this paper, we present an open source tagging tool with visual interface that enables the construction of annotated Arabic text corpora with automatic morphology-based tags.  ...  The work in [4] presents a framework for interlingual annotation of parallel text corpora with multi-level representations.  ... 
doi:10.1109/aiccsa.2013.6616418 dblp:conf/aiccsa/ZaraketJ13 fatcat:tzzj5gwyvzff7pys4fwo5qk6wu

Trainable Coarse Bilingual Grammars for Parallel Text Bracketing [chapter]

D. Wu
1997 Text, Speech and Language Technology  
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised  ...  Preliminary experiments on parallel English-Chinese text are supportive of these strategies.  ...  The problem of bracketing such corpora is the focus of two new strategies described in this paper.  ... 
doi:10.1007/978-94-017-2390-9_15 fatcat:diw3mpt5yvaqrdw3pp6tvx7v2m

Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

Alberto Simões, Xavier Gómez Guinovart, Michael Wagner
2018 Symposium on Languages, Applications and Technologies  
The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy.  ...  In this paper we describe the methodology and evaluation of the expansion of Galnet -the Galician wordnet -using a multilingual Bible through lexical alignment and semantic annotation.  ...  Another line of research on automatic extension of ontologies is carried out for Portuguese in the framework of Onto.PT 10 [11], where new synsets are obtained from lexicographical, encyclopedic and textual  ... 
doi:10.4230/oasics.slate.2018.14 dblp:conf/slate/SimoesG18 fatcat:fb2exkqmgvbtbkoogzvgkmzddq

Lump at SemEval-2017 Task 1: Towards an Interlingua Semantic Similarity

Cristina España-Bonet, Alberto Barrón-Cedeño
2017 Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)  
We include lexical similarities, cross-language explicit semantic analysis, internal representations of multilingual neural networks and interlingual word embeddings.  ...  Our representations allow to use large datasets in language pairs with many instances to better classify instances in smaller language pairs avoiding the necessity of translating into a single language  ...  The research work of the second author is carried out in the framework of the Interactive sYstems for Answer Search project (IYAS), at the Qatar Computing Research Institute, HBKU.  ... 
doi:10.18653/v1/s17-2019 dblp:conf/semeval/Espana-BonetB17 fatcat:ne6rifwvpnac7n72e4rty4pwv4
« Previous Showing results 1 — 15 out of 295 results