A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Interlingual annotation of parallel text corpora: a new framework for annotation and evaluation
2010
Natural Language Engineering
The resulting annotated, multilingually-induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems ...
This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation ...
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor. ...
doi:10.1017/s1351324910000070
fatcat:nepwusm6rjdqdglutqntk6fnyq
Word Sense Disambiguation Using English-Spanish Aligned Phrases over Comparable Corpora
[article]
2009
arXiv
pre-print
The evaluation of the experiment has been carried out against SemCor. ...
In this paper we describe a WSD experiment based on bilingual English-Spanish comparable corpora in which individual noun phrases have been identified and aligned with their respective counterparts in ...
Acknowledgements We are indebted to Julio Gonzalo for coming up with the idea of applying the noun phrases plus the ILI to WSD and for his advice, and to Fernando Lopez Ostenero for his willing assistance ...
arXiv:0910.5682v1
fatcat:jgq77vzs6verjawed5g55ysbea
Scalable Construction of High-Quality Web Corpora
2013
Journal for Language Technology and Computational Linguistics
Finally, we show how the availability of extremely large, high-quality corpora opens up new directions for research in various fields of linguistics, computational linguistics, and natural language processing ...
As we are working with web data, controlling the quality of the resulting corpus is an important issue, which we address by showing how corpus statistics and a linguistic evaluation can be used to assess ...
Acknowledgments The second evaluation study reported in Section 4.2 is based on joint work with Sabine Bartsch. ...
dblp:journals/ldvf/BiemannBEGQSSSZ13
fatcat:eciovvcvazewnfuhk7shiksuiy
Building wordnets with multi-word expressions from parallel corpora
2020
Revista de Procesamiento de Lenguaje Natural (SEPLN)
In this paper we present a method for enlarging wordnets focusing on multi-word terms and utilising data from parallel corpora. Our approach is validated using the Galician and Portuguese wordnets. ...
The multi-word candidates obtained in this experiment were manually validated, obtaining a 73.2% accuracy for the Galician language and a 75.5% for the Portuguese language. ...
and the European Fund for Regional Development (MCIU/AEI/FEDER), and was partially funded by Portuguese National funds (PIDDAC), through the FCT -Fundação para a Ciência e Tecnologia and FCT/MCTES under ...
dblp:journals/pdln/SimoesG20
fatcat:o7tkhfap2jcunnzcqludf3ltti
A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora
2014
Language Resources and Evaluation
Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts ...
To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. ...
Maite Taboada We would like to thank the anonymous reviewers for their comments and suggestions, Nynke van der Vliet for her feedback on the evaluation method, Esther Miranda for designing the website ...
doi:10.1007/s10579-014-9271-6
fatcat:3svtkso6krcuhfwvwtbx7ouq64
Contrastive Pragmatics and Corpora
2020
Contrastive Pragmatics
Contrastive pragmatics is closely associated with the use of parallel and comparable corpora for studying the similarities and differences between languages. ...
Parallel corpora have now been extended to more than two languages making them more relevant for typological research, and they can be used to investigate whether there are (discourse) universals across ...
The particular advantage of parallel corpora is that they allow the researcher to make specific and fine-grained comparisons on the basis of texts which are interlingually comparable. ...
doi:10.1163/26660393-12340004
fatcat:7sbhpmloirby5fz3puma4phifq
LINGUISTIC CORPORA TECHNOLOGY AS A DIDACTIC TOOL IN TRAINING FUTURE TRANSLATORS
2020
Ìnformacìjnì Tehnologì ì Zasobi Navčannâ
Linguistic corpora are a state-of-the-art technology that can solve the outlined problem perfectly, for it opens a broad variety of practical and theoretical research options, and at the same time it is ...
Moreover, parallel corpora provide ready solutions for the choice of translation models in certain conditions. ...
Construction of such a corpus involves several stages, including the compulsory markup and annotation / tagging of parallel texts: meta-textual, structural, and linguistic. ...
doi:10.33407/itlt.v79i5.3626
fatcat:ci24dtonqzdolkkv2hmlylzu74
Translation Studies and Representative Corpora: Establishing Links between Translation Corpora, Theoretical/Descriptive Categories and a Conception of the Object of Study
1998
Meta : Journal des traducteurs
As a result, the question of comparable versus parallel corpora of various types will not be addressed (see Granger 1994 and Baker 1995 for discussion of varying corpus types and differences in terminology ...
Achieving such an end would imply the enrichment of the discipline as a whole in the direction of new means of testing existing theories, and new types of theoretical questions. ...
doi:10.7202/003000ar
fatcat:q72iptmnynbjvlgcqpatl5je3i
Genre and Register in Comparable Corpora: An English/Spanish Contrastive Analysis
2017
Meta : Journal des traducteurs
traducción inglés-español (ACTRES): Aplicaciones lingüísticas para la internacionalización de la industria de transformación agroalimentaria (LE-227413), supported financially by Junta de Castilla y León and ...
This methodological adjustment also allowed for more parallelism between the English and Spanish texts and hence for more accurate interlingual comparison of the wine tasting notes. ...
The need for manual annotation of the corpora explains why the corpora are not very large. ...
doi:10.7202/1040469ar
fatcat:k2m6bpvh4zhtjj5wxs5m5gpyty
Word Sense Disambiguation Using Wikipedia
[chapter]
2013
The People's Web Meets NLP
We present three approaches to word sense disambiguation that use Wikipedia as a source of sense annotations. ...
Experiments on four languages confirm that the Wikipedia sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. ...
Acknowledgments This material is based in part upon work supported by the National Science Foundation IIS awards #1018613 and #1018590 and CAREER award #0747340. ...
doi:10.1007/978-3-642-35085-6_9
dblp:series/tanlp/DandalaMB13
fatcat:pchga2qz3vgg5jskyhkjplnlvi
Multilingual language resources and interoperability
2009
Language Resources and Evaluation
This article introduces the topic of "Multilingual Language Resources and Interoperability". We start with a taxonomy and parameters for classifying language resources. ...
Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability. ...
European attempts to design a representation and query system for multiply annotated text and speech corpora in parallel. ...
doi:10.1007/s10579-009-9088-x
fatcat:473iagvv4nbq3ehoifeupwwcaa
MATAr: Morphology-based Tagger for Arabic
2013
2013 ACS International Conference on Computer Systems and Applications (AICCSA)
The annotation tags serve for training, validation, and evaluation. ...
In this paper, we present an open source tagging tool with visual interface that enables the construction of annotated Arabic text corpora with automatic morphology-based tags. ...
The work in [4] presents a framework for interlingual annotation of parallel text corpora with multi-level representations. ...
doi:10.1109/aiccsa.2013.6616418
dblp:conf/aiccsa/ZaraketJ13
fatcat:tzzj5gwyvzff7pys4fwo5qk6wu
Trainable Coarse Bilingual Grammars for Parallel Text Bracketing
[chapter]
1997
Text, Speech and Language Technology
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised ...
Preliminary experiments on parallel English-Chinese text are supportive of these strategies. ...
The problem of bracketing such corpora is the focus of two new strategies described in this paper. ...
doi:10.1007/978-94-017-2390-9_15
fatcat:diw3mpt5yvaqrdw3pp6tvx7v2m
Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation
2018
Symposium on Languages, Applications and Technologies
The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy. ...
In this paper we describe the methodology and evaluation of the expansion of Galnet -the Galician wordnet -using a multilingual Bible through lexical alignment and semantic annotation. ...
Another line of research on automatic extension of ontologies is carried out for Portuguese in the framework of Onto.PT 10 [11], where new synsets are obtained from lexicographical, encyclopedic and textual ...
doi:10.4230/oasics.slate.2018.14
dblp:conf/slate/SimoesG18
fatcat:fb2exkqmgvbtbkoogzvgkmzddq
Lump at SemEval-2017 Task 1: Towards an Interlingua Semantic Similarity
2017
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
We include lexical similarities, cross-language explicit semantic analysis, internal representations of multilingual neural networks and interlingual word embeddings. ...
Our representations allow to use large datasets in language pairs with many instances to better classify instances in smaller language pairs avoiding the necessity of translating into a single language ...
The research work of the second author is carried out in the framework of the Interactive sYstems for Answer Search project (IYAS), at the Qatar Computing Research Institute, HBKU. ...
doi:10.18653/v1/s17-2019
dblp:conf/semeval/Espana-BonetB17
fatcat:ne6rifwvpnac7n72e4rty4pwv4
« Previous
Showing results 1 — 15 out of 295 results