Shallow discourse parsing for German release_ai7yjywh35e5thecdg5ihfra6q

by Peter Bourgonje

Published by Universität Potsdam.

2021  

Abstract

While the last few decades have seen impressive improvements in several areas in Natural Language Processing, asking a computer to make sense of the discourse of utterances in a text remains challenging. There are several different theories that aim to describe and analyse the coherent structure that a well-written text inhibits. These theories have varying degrees of applicability and feasibility for practical use. Presumably the most data-driven of these theories is the paradigm that comes with the Penn Discourse TreeBank, a corpus annotated for discourse relations containing over 1 million words. Any language other than English however, can be considered a low-resource language when it comes to discourse processing. This dissertation is about shallow discourse parsing (discourse parsing following the paradigm of the Penn Discourse TreeBank) for German. The limited availability of annotated data for German means the potential of modern, deep-learning based methods relying on such data is also limited. This dissertation explores to what extent machine-learning and more recent deep-learning based methods can be combined with traditional, linguistic feature engineering to improve performance for the discourse parsing task. A pivotal role is played by connective lexicons that exhaustively list the discourse connectives of a particular language along with some of their core properties. To facilitate training and evaluation of the methods proposed in this dissertation, an existing corpus (the Potsdam Commentary Corpus) has been extended and additional data has been annotated from scratch. The approach to end-to-end shallow discourse parsing for German adopts a pipeline architecture and either presents the first results or improves over state-of-the-art for German for the individual sub-tasks of the discourse parsing task, which are, in processing order, connective identification, argument extraction and sense classification. The end-to-end shallow discourse parser for German that has been developed for the purpose of [...]
In text/plain format

Archived Files and Locations

application/pdf   5.8 MB
file_7t4b773lcfe77mzn6ri275yyu4
publishup.uni-potsdam.de (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  thesis
Stage   published
Date   2021-05-06
Language   en ?
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: e4d85aee-dc1d-4a4f-a598-14899dc51f94
API URL: JSON