CROATPAS: A Resource of Corpus-derived Typed Predicate Argument Structures for Croatian

Costanza Marini, Elisabetta Jezek
2019 Italian Conference on Computational Linguistics  
The goal of this paper is to introduce CROATPAS, the Croatian sister project of the Italian Typed-Predicate Argument Structure resource (TPAS 1 , Ježek et al. 2014). CROATPAS is a corpus-based digital collection of verb valency structures with the addition of semantic type specifications (SemTypes) to each argument slot, which is currently being developed at the University of Pavia. Salient verbal patterns are discovered following a lexicographical methodology called Corpus Pattern Analysis
more » ... , Hanks 2004 & 2012; Hanks & Pustejovsky 2005; Hanks et al. 2015), whereas SemTypes -such as [HUMAN], [ENTITY] or [ANIMAL] -are taken from a shallow ontology shared by both TPAS and the Pattern Dictionary of English Verbs (PDEV 2 , Hanks & Pustejovsky 2005; El Maarouf et al. 2014). The theoretical framework the resource relies on is Pustejovsky's Generative Lexicon theory (1995 & 1998; Pustejovsky & Ježek 2008), in light of which verbal polysemy and metonymic argument shifts can be traced back to compositional operations involving the variation of the SemTypes associated to the valency structure of each verb. The corpus used to identify verb patterns in CROATPAS is the Croatian Web as Corpus (hrWac 2.2, RELDI PoS-tagged) (Ljubešić & Erjavec 2011), which contains 1.2 billion types and is available on the Sketch Engine 3 (Kilgarriff et al. 1 http://tpas.fbk.eu (last visited on July 12 th 2019) 2 http://pdev.org.uk (last visited on July 12 th 2019) 3 https://www.sketchengine.eu/ (last visited on July 12 th 2019) 2014). The potential uses and purposes of the resource range from multilingual pattern linking between compatible resources to computer-assisted language learning (CALL).
dblp:conf/clic-it/MariniJ19 fatcat:36zf3dia2zdntiykahkvo54dha