Tectogrammatical Annotation of the Wall Street Journal

Silvie Cinková, Josef Toman, Jan Hajič, Kristýna Čermáková, Václav Klimeš, Lucie Mladová, Jana Šindlerová, Kristýna Tomšů, Zdeněk Žabokrtský
2009 Prague Bulletin of Mathematical Linguistics  
This paper gives an overview of the current state of the Prague English Dependency Treebank project. It is an updated version of a draft text that was released along with a CD presenting the first 25% of the PDT-like version of the Penn Treebank -WSJ section (PEDT 1.0). Before the January 2009 release, the conversion from the original phrase structure trees into dependency trees as well as the consistency checks were substantially enhanced to save manual work. The conversion is partly performed
more » ... by scripted rules and partly by a statistical parser. To make the rules more powerful, the phrase-based Penn Treebank -WSJ was enriched with other publicly available language resources -the manual annotation of flat noun phrases and the named-entity and coreference tagging. At the moment, 50% of the 1 million corpus have been manually annotated and consistencychecked on the tectogrammatical layer.
doi:10.2478/v10108-009-0023-5 fatcat:mcczoylq4bcvtewtfosrc4n56i