Task-based parser output combination : workflow and infrastructure [article]

Kerstin Eckart, Universität Stuttgart, Universität Stuttgart
2018
This dissertation introduces the method of task-based parser output combination as a device to enhance the reliability of automatically generated syntactic information for further processing tasks. Parsers, i.e. tools generating syntactic analyses, are usually based on reference data. Typically these are modern news texts. However, the data relevant for applications or tasks beyond parsing often differs from this standard domain, or only specific phenomena from the syntactic analysis are
more » ... y relevant for further processing. In these cases, the reliability of the parsing output might deviate essentially from the expected outcome on standard news text. Studies for several levels of analysis in natural language processing have shown that combining systems from the same analysis level outperforms the best involved single system. This is due to different error distributions of the involved systems which can be exploited, e.g. in a majority voting approach. In other words: for an effective combination, the involved systems have to be sufficiently different. In these combination studies, usually the complete analyses are combined and evaluated. However, to be able to combine the analyses completely, a full mapping of their structures and tagsets has to be found. The need for a full mapping either restricts the degree to which the participating systems are allowed to differ or it results in information loss. Moreover, the evaluation of the combined complete analyses does not reflect the reliability achieved in the analysis of the specific aspects needed to resolve a given task. This work presents an abstract workflow which can be instantiated based on the respective task and the available parsers. The approach focusses on the task-relevant aspects and aims at increasing the reliability of their analysis. Moreover, this focus allows a combination of more diverging systems, since no full mapping of the structures and tagsets from the single systems is needed. The usability of this method is also increased by focussing [...]
doi:10.18419/opus-9853 fatcat:uqcylmhbqbgz5atv3q3egf4zma