Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories

Marco Passarotti, Adam Przepiórkowski, Savina Raynaud, Frank Van Eynde
2012 Largo Gemelli   unpublished
distribuzione) web: www.unicatt.it/librario ISBN: 978-88-8311-712-1 Cover illustration: a cloister of the Catholic University, Milan. Photo by Marco Passarotti. Preface The Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) was held at the Catholic University of the Sacred Heart in Milan (Italy) on 4-5 December 2009 (see http://tlt8.unicatt.it). This was the first time that it has been held in Italy. Dates and locations of the previous workshops are provided in a separate
more » ... section. Since its first edition in 2002, TLT has provided a forum for discussion of methods and tools for the design, creation and exploitation of treebanks and the linguistic theories acting as their background. Today, treebanks are essential resources both for data-driven approaches to natural language processing and for linguistic research. Indeed, while treebank data are frequently exploited for tasks in computational linguistics such as grammar induction and the training of NLP tools, in linguistic research they can be used in order to refine and improve pre-corpus linguistic theories. Furthermore, large-scale data annotation allows for empirically evaluating the accuracy of a grammar and revising it on the basis of evidence. Recently, many treebank projects for less-resourced languages have begun. The increasing spread of such treebanks benefits from the exploitation of tools and methods developed over the years for many similar projects for other languages. The language-independent status of these methods and tools has indeed allowed their re-usability (or easy adaptation) to many different languages. This has eased and sped up the process of creation and dissemination of treebanks for less-resourced languages. Another growing research direction is the development of parallel treebanks, which are vital resources for machine translation and comparative studies. The call for papers for TLT8 requested for unpublished, completed work. 30 submissions were received, 25 for full papers, 5 for poster presentations. The submissions were authored by researchers from 19 different countries in America, Asia and Europe. Each submission was evaluated by three reviewers. The Programme Committee consisted of 24 members (including the 4 cochairs) from 14 different countries. They all worked as reviewers. Based on III their scores and the comments they provided on the content and quality of the papers, 15 papers and 4 posters were accepted for presentation and publication, which corresponds to an acceptance rate of 63.3%. The accepted submissions cover a wide range of topics related to both long-standing and new treebanks, reporting on aspects of their construction, querying, exploitation and evaluation. As requested in the call for papers, this edition puts a particular emphasis on projects aiming to compile representative treebanks for less-resourced, ancient and/or dead languages. Completing the programme are the invited lectures by Roberto Busa SJ (Catholic University of the Sacred Heart, Milan) and Eva Hajičová (Charles University, Prague, Czech Republic). There is a connection between the research work of the two invited speakers thanks to the ongoing project of the Index Thomisticus Treebank (at the Catholic University in Milan), whose annotation guidelines were designed according to those of the Prague Dependency Treebank. Following in the tradition of TLT's recent editions, a co-located event was also organised (see http://tlt8.unicatt.it/framenet.htm). This one-day event, preceding TLT8, was devoted to the FrameNet project and conceived as a masterclass and talk by Charles J. Fillmore (in the morning) followed in the afternoon by a workshop with seven oral presentations (peer-reviewed) on research concerning FrameNet and related linguistic and corpus topics. The organization of this co-located event arose from the consideration that, while the FrameNet team has begun to annotate some texts as a demonstration of how frame semantics can contribute to text understanding, no FrameNet-annotated large corpus is currently available, and FrameNet data are systematically biased by the criteria for the selection of the examples adopted to describe the frame semantics of target words. Therefore, a closer collaboration between FrameNet and annotated corpora (especially, treebanks) is now required for at least two reasons: (a) during the procedure of syntactic annotation, FrameNet data can help with consistency and can supply motives in support of annotation choices; (b) annotated corpora provide further evidence for FrameNet data, allowing lexicographers to ground their decisions on a wider variety of examples. Thus, the aim of this workshop was to put people who are involved in treebank development, management and exploitation into contact with the FrameNet project.
fatcat:te25pznnxvcs7jqceeqcyoxcse