Text parsing with Markov logic network [post]

Nan Wang
2017 unpublished
This document describes a novel way to extract structure information from plain text using Markov Decision Process. In the age of big data, unstructured information such as text, photos and videos be- comes abundant. However, data warehouse requires structured data with well-defined schema. It has been a challenge for the computer science community to extract useful data under strict schema from unstructured data schema. Here we proposed an automated system that is able to understand and infer
more » ... derstand and infer the most likely counterpart in text stream that corresponds to a led under the requested schema. The designed algorithm formulated the plain text using context dependent grammar with various weights, which would be sued to decide which eld of the structured schema a particular piece of unstructured data belongs to. A machine-learning algorithm is used to learn the weights from training data. We implemented this automated system and applied it to extract schema data from plain US bankruptcy petition forms.
doi:10.7287/peerj.preprints.2774 fatcat:m6bl5f7uurecpng3kbxjdmt5zq