Structured Querying of Web Text Data: A Technical Challenge

Michael J. Cafarella, Christopher Ré, Dan Suciu, Oren Etzioni
2007 Conference on Innovative Data Systems Research  
The Web contains a huge amount of text that is currently beyond the reach of structured access tools. This unstructured data often contains a substantial amount of implicit structure, much of which can be captured using information extraction (IE) algorithms. By combining an IE system with an appropriate data model and query language, we could enable structured access to all of the Web's unstructured data. We propose a general-purpose query system called the extraction database, or ExDB, which
more » ... upports SQL-like structured queries over Web text. We also describe the technical challenges involved, motivated in part by our experiences with an early 90M-page prototype.
dblp:conf/cidr/CafarellaRSE07 fatcat:unbtublzdzcqzhen5sai32rn3y