The Taming of the Shrew - non-standard text processing in the Digital Humanities
Natural language processing (NLP) has focused on the automatic processing of newspaper texts for many years. With the growing importance of text analysis in various areas such as spoken language understanding, social media processing and the interpretation of text material from the humanities, techniques and methodologies have to be reviewed and redefined since so called non-standard texts pose challenges on the lexical and syntactic level especially for machine-learning-based approaches.
... tic processing tools developed on the basis of newspaper texts show a decreased performance for texts with divergent characteristics. Digital Humanities (DH) as a field that has risen to prominence in the last decades, holds a variety of examples for this kind of texts. Thus, the computational analysis of the relationships of Shakespeare's dramatic characters requires the adjustment of processing tools to English texts from the 16th-century in dramatic form. Likewise, the investigation of narrative perspective in Goethe's ballads calls for methods that can handle German verse from the 18th century. In this dissertation, we put forward a methodology for NLP in a DH environment. We investigate how an interdisciplinary context in combination with specific goals within projects influences the general NLP approach. We suggest thoughtful collaboration and increased attention to the easy applicability of resulting tools as a solution for differences in the store of knowledge between project partners. Projects in DH are not only constituted by the automatic processing of texts but are usually framed by the investigation of a research question from the humanities. As a consequence, time limitations complicate the successful implementation of analysis techniques especially since the diversity of texts impairs the transferability and reusability of tools beyond a specific project. We answer to this with modular and thus easily adjustable project workflows and system architectures. Several instances serve as examples for our methodolo [...]