Structured Web Data Extraction: University Domain [thesis]

Yifeng Li
In the Semantic Web [1], information is structured and thus processable by machines. However, it is still largely unrealized. The current web is simply a collection of unstructured documents. To find information on the web, we use search engines such as Google to retrieve relevant documents. Users often need to search through the retrieved documents to find information. Due to web information explosion, it has become harder and harder for users to find information easily. While Google is trying
more » ... to provide the most relevant results, our goal is to provide precise results that answer structured queries. To achieve our goal, we adopt the information extraction approach. In particular, we extract structured data from the unstructured web and organize the extracted data in a database to provide search functions. This thesis focuses on the implementation of a web information extraction system in a university domain. ii
doi:10.22215/etd/2014-10139 fatcat:jctaapcub5d7rpz3wjpg43g7gi