Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage Hissar, Bulgaria INTERNATIONAL WORKSHOP LANGUAGE TECHNOLOGIES FOR DIGITAL HUMANITIES AND CULTURAL HERITAGE PROCEEDINGS

Cristina Vertan, Milena Slavcheva, Petya Osenova, Stelios Piperidis, Cristina Vertan, Milena Slavcheva, Stelios Piperidis, Galia Angelova, David Baumann, António Branco, Nicoletta Calzolari, G Unther Görz (+10 others)
2011 The 8th International Conference on Recent Advances in Natural Language Processing   unpublished
Designed and Printed by INCOMA Ltd. Shoumen, BULGARIA ii Foreword Following several digitization campaigns during the last years, a large number of printed books, manuscripts and archaeological digital objects have become available through web portals and associated infrastructures to a broader public. These infrastructures enable not only virtual research and easier access to materials independent of their physical place, but also play a major role in the long term preservation and
more » ... However, the access to digital materials opens new possibilities of textual research like: synchronous browsing of several materials, extraction of relevant passages for a certain event from different sources, rapid search through thousand pages, categorisation of sources, multilingual retrieval and support, etc. Methods from Language Technology are therefore highly required in order to ensure extraction of content related semantic metadata, and analysis of textual materials. There are several initiatives in Europe aiming to foster the application of language technology in the humanities (CLARIN, DARIAH). Through initiatives like those, as well as many other research projects, the awareness of such methods for the humanities has risen considerably. However, there is still enough potential on both sides: • on one hand, there are still research tracks in the humanities which do not sufficiently and effectively exploit language technology solutions; • on the other hand, there are many languages, especially historical variants of languages, for which the available tools and resources still have to be developed or adapted to serve successfully humanities applications. The current workshop brings together researchers from the Humanities, as well as from Language and Information Technologies, and thus fosters the above mentioned directions. As a confirmation of the generated interest in the topic of our workshop, we received a large number of very good submissions. This fact allowed us to provide a programme covering the most important aspects within the area of digital humanities and cultural heritage. Following the workshop programme, the Proceedings of the workshop are thematically structured as follows: Electronic Archives, Language Technology and Resources, Computational Methods for Literary Analysis, Multimodal Aspects in Digital Humanities. The workshop papers address a multitude of problems and suggest a wealth of developments and solutions related to the digital humanities and the preservation of cultural heritage. The papers represent a whole spectrum of relevant topics: utilizing interlinked semantic technologies for managing and accessing museum data; exploiting topic models in a query classification system for an art image archive; metadata and content-oriented search methods for a multilingual audio-and-video archive; maintaining a digital library of Polish and Poland-related old ephemeral prints; normalization of historical wordforms in German; developing a Bulgarian-Polish on-line dictionary as a technological tool for applications in the digital humanities; semantic annotation models based on ontological representation of knowledge concerning Bulgarian iconography; preparation of an electronic edition of the largest Old Church Slavonic manuscript, the Codex Suprasliensis; literary research support by creating and visualizing profiles of sentimental content in texts; profiling of literary characters in 19th century Swedish prose fiction by interpersonal relation extraction; investigation of diachronic stylistic changes in British and American varieties of 20th century written English language; speeding up the process of creating annotations of audio-visual data for humanities research; automatic transcription of ancient handwritten documents; OCR processing of Gothic-script documents. We would like to thank the Organisers of the RANLP events, especially Galia Angelova and Kiril Simov, for their unceasing help in the organisation of the workshop. iii We are indebted to the Programme Committee members who provided very detailed reviews in extremely short time.