433 Hits in 4.1 sec

XML-based phrase alignment in parallel treebanks

Martin Volk, Sofia Gustafson-Capková, Joakim Lundborg, Torsten Marek, Yvonne Samuelsson, Frida Tidström
2006 Proceedings of the 5th Workshop on NLP and XML Multi-Dimensional Markup in Natural Language Processing - NLPXML '06   unpublished
This paper describes the usage of XML for representing cross-language phrase alignments in parallel treebanks.  ...  We have developed a TreeAligner as a tool for interactively inserting and correcting such alignments as an independent level of treebank annotation.  ...  Conclusion We have shown a straightforward way to tie in XML-based phrase alignment information with syntax trees represented in TIGER-XML.  ... 
doi:10.3115/1621034.1621053 fatcat:k27wtbm2rrbpxmgwiqnu4nhbxi

Building And Querying Parallel Treebanks [chapter]

Martin Volk, Torsten Marek, Yvonne Samuelsson
2017 Zenodo  
Our parallel treebank includes word and phrase alignments.  ...  This paper describes our work on building a trilingual parallel treebank.  ...  The author describes his system for automatic phrase alignment over parallel trees which is based on word alignment probabilities provided by GIZA.  ... 
doi:10.5281/zenodo.283438 fatcat:d2d3zk2fnnhljewmbybtknejhi

Parallel TreeBanks: Observations for Implication of Equivalent Alignments

Oleg Kapanadze
2016 International Journal of Computational Linguistics and Applications  
Building a parallel Treebank anticipates alignment of linguistic information represented by diverse structures on different layers of a bilingual text.  ...  as equivalent units in the bilingual text alignment issue.  ...  The Stockholm TreeAligner uses monolingual graph structures in the TIGER-XML format as representations and handles in parallel treebanks alignment of tree structures in addition to the token alignment.  ... 
dblp:journals/ijcla/Kapanadze16 fatcat:ll3aizyowbhjngdlq5c43xnw74

Extending the TIGER query language with universal quantification [chapter]

Torsten Marek, Joakim Lundborg, Martin Volk
2008 Text Resources and Lexical Knowledge  
We have implemented this extension to the query language in a tool that allows querying parallel treebanks, while including their alignment constraints.  ...  The query language in TIGERSearch is limited due to its lack of universal quantification.  ...  We have aligned the two treebanks first on the sentence level to get corresponding tree pairs and then on the word level and phrase level. Figure 4 shows a tree pair from our parallel treebank.  ... 
doi:10.1515/9783110211818.1.3 fatcat:yqpsobqshfgsblanzlau6eqedi

LinES: An English-Swedish Parallel Treebank

Lars Ahrenberg
2007 Nordic Conference of Computational Linguistics  
This paper presents an English-Swedish Parallel Treebank, LinES, that is currently under development.  ...  Another aim of LinES is to support queries made in terms of types of translation shifts.  ...  Dependency analysis has an advantage for parallel treebanks in that phrase alignment to a large extent is given for free from the word alignment.  ... 
dblp:conf/nodalida/Ahrenberg07 fatcat:7fah3krtr5eyxbhfvafbmxx3zq

Syntactic Translation Patterns from a Parallel Treebank

Mihaela Colhon
2012 Balkan Conference in Informatics  
To make this approach feasible, we consider the phrase-to-phrase alignments of a bilingual treebank annotated with syntactic constituents.  ...  The goal of the presented parallel phrase extraction algorithm is to provide rich and robust set of translation syntactic patterns.  ...  The treebank was built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level.  ... 
dblp:conf/bci/Colhon12 fatcat:3ejab3izlrgn7keu4boeqeaapq

Language engineering for syntactic knowledge transfer

Mihaela Colhon
2012 Computer Science and Information Systems  
The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level.  ...  In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results.  ...  Cuza University of Iaşi, Romania, for providing the English-Romanian corpus upon which the presented treebank generation mechanism was developed and also evaluated.  ... 
doi:10.2298/csis120130032c fatcat:ujg7stzndvccdgzims2oj6c3xm

Unsupervised Generation of Parallel Treebanks through Sub-Tree Alignment

Ventsislav Zhechev
2009 Prague Bulletin of Mathematical Linguistics  
In this paper we introduce an open-source system for fast and robust automatic generation of parallel treebanks.  ...  e need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. is is true especially for parallel treebanks, of which very few exist.  ...  Acknowledgments We would like to thank Mary Hearne, John Tinsley, Andy Way and Khalil Sima'an for their participation in the development of the algorithms.  ... 
doi:10.2478/v10108-009-0019-1 fatcat:73se6yd7xzdhvfedqyczwg2j7e

Seeding Statistical Machine Translation with Translation Memory Output through Tree-Based Structural Alignment

Ventsislav Zhechev, Josef van Genabith
2010 Workshop on Syntax, Semantics and Structure in Statistical Translation  
In this paper we present a novel modular approach that utilises state-of-the-art sub-tree alignment to pick out pre-translated segments from a TM match and seed with them an SMT system to produce a final  ...  It can also be used in a Computer-Aided Translation (CAT) environment to present almost perfect translations to the human user with markup highlighting the segments of the translation that need to be checked  ...  word order, as implied by the data in the parallel treebank.  ... 
dblp:conf/ssst/ZhechevG10 fatcat:nwdp2f2qqrfhtbyqurwc4njm2y

GrETEL. A Tool for Example-Based Treebank Mining [chapter]

2017 CLARIN in the Low Countries  
GrETEL is a linguistic search tool that enables users to look up constructions in syntactically annotated corpora or treebanks.  ...  A major asset of GrETEL is that it enables non-technical users to consult treebanks in a user-friendly way, which is also in line with the main CLARIN goal of applying the results of speech and language  ...  Acknowledgements The work on GrETEL was carried out in the framework of the following projects:  ... 
doi:10.5334/bbi.22 fatcat:3kcm77r3hvee5icmayzrcjpwli

Parse and Corpus-Based Machine Translation [chapter]

Vincent Vandeghinste, Scott Martens, Gideon Kotzé, Jörg Tiedemann, Joachim Van den Bogaert, Koen De Smet, Frank Van Eynde, Gertjan van Noord
2012 Essential Speech and Language Technology for Dutch  
The current state-of-the-art in machine translation consists of phrase-based statistical machine translation (PB-SMT) [23] , an approach which has been used since the late 1990s, evolving from word-based  ...  To overcome these limitations efforts have been made to introduce syntactic knowledge into the statistical paradigm, usually in the form of syntax trees, either V. Vandeghinste ( ) J.  ...  An example grammar rule is shown in Fig. 17 .5. In order to induce such a grammar a node aligned parallel treebank is required. Section 17.3.1 describes how to build such a treebank.  ... 
doi:10.1007/978-3-642-30910-6_17 dblp:series/tanlp/VandeghinsteMKTBSEN13 fatcat:n4nmp7om75h4bjymvjya5g3jni

Large Scale Syntactic Annotation of Written Dutch: Lassy [chapter]

Gertjan van Noord, Gosse Bouma, Frank Van Eynde, Daniël de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim Sang, Vincent Vandeghinste
2012 Essential Speech and Language Technology for Dutch  
Querying the Treebanks As the annotations are represented in XML, there is a variety of tools available to work with the annotations.  ...  As one case in point, we note that a preliminary version of the Lassy Large treebank was used as gold standard training data to train a memory-based parser for Dutch [12] .  ... 
doi:10.1007/978-3-642-30910-6_9 dblp:series/tanlp/NoordBEKLSSV13 fatcat:oouwpqiwvrdfzjbvbyf6pkcmry

The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

Sasha Calhoun, Jean Carletta, Jason M. Brenier, Neil Mayo, Dan Jurafsky, Mark Steedman, David Beaver
2010 Language Resources and Evaluation  
The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al. in Lang Resour Eval J 39(4):313-334, 2005).  ...  In Proceedings of ICASSP-92, pp. 517-520, 1992).  ...  to Joanna Keating, Joseph Arko and Hannele Nicholson for their hard work in annotating.  ... 
doi:10.1007/s10579-010-9120-1 fatcat:rqj5kdrfzbgjzjmyzl7j4l5hbm

Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus

Zdenka Uresova, Ondřej Dušek, Eva Fucikova, Jan Hajic, Jana Sindlerova
2015 Proceedings of The 9th Linguistic Annotation Workshop  
This paper presents a resource and the associated annotation process used in a project of interlinking Czech and English verbal translational equivalents based on a parallel, richly annotated dependency  ...  treebank containing also valency and semantic roles, namely the Prague Czech-English Dependency Treebank.  ...  As this mapping is based on the parallel Prague Czech-English Dependency Treebank (PCEDT), which also contains monolingual valency annotation on each side, we are getting a powerful, real-text-based complex  ... 
doi:10.3115/v1/w15-1613 dblp:conf/acllaw/UresovaDFHS15 fatcat:krzabpvbl5bddkmj5a5pmupvcu

CzEng: Czech-English Parallel Corpus release version 0.5

Ondrej Bojar, Zdenek Zabokrtský
2006 Prague Bulletin of Mathematical Linguistics  
We introduce CzEng 0.5, a new Czech-English sentence-aligned parallel corpus consisting of around 20 million tokens in either language.  ...  Besides the description of the corpus, also preliminary results concerning statistical machine translation experiments based on CzEng 0.5 are presented.  ...  quality, we report BLEU (Papineni et al., 2002) scores of a state-of-the-art phrase-based MT system Moses. 6 For this experiment, we selected 1-1 aligned sentences up to 50 words from CzEng 0.5.  ... 
dblp:journals/pbml/BojarZ06 fatcat:2pbd4glcnrd4ph42ka54bbavjy
« Previous Showing results 1 — 15 out of 433 results