Towards Neural Information Extraction without Manual Annotated Data

Peng Xu
2018
Information extraction (IE) is one of the most important technologies in the information age. Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. However, information extraction is a very challenging task, due to the inherent difficulties to understand natural language and the high cost to obtain large manual annotated training data. In this thesis, we build on the premise of
more » ... ing automatic information extraction without manual annotated data following the distant supervision paradigm and present novel neural models for different IE tasks which are particularly suited for this setting. In the first part of the thesis, we focus on one IE task -fine-grained entity type classification (FETC) and propose the NFETC model -a single, much simpler and more elegant neural network model that attempts FETC "endto-end" without post-processing or ad-hoc features. We study two kinds of noise, namely out-of-context noise and overly-specific noise, for noisy type labels and investigate their effects on FETC systems. We propose a neural network based model which jointly learns representations for entity mentions and their context. A variant of cross-entropy loss function is used to handle out-of-context noise. Hierarchical loss normalization is introduced into our model to alleviate the effect of overly-specific noise. In the second part of the thesis, we focus on another IE task -relation
doi:10.7939/r3g44j63v fatcat:olkj4cbd5vecblax2onbkewpcm