Text and structured data fusion in data tamer at scale

Michael Gubanov, Michael Stonebraker, Daniel Bruckner
2014 2014 IEEE 30th International Conference on Data Engineering  
Large-scale text data research has recently started to regain momentum [1]- [10] , because of the wealth of up to date information communicated in unstructured format. For example, new information in online media (e.g. Web blogs, Twitter, Facebook, news feeds, etc) becomes instantly available and is refreshed regularly, has very broad coverage and other valuable properties unusual for other data sources and formats. Therefore, many enterprises and individuals are interested in integrating and
more » ... ing unstructured text in addition to their structured data. DATA TAMER, introduced in [11] is a new data integration system for structured data sources. Its features include a schema integration facility, an entity consolidation module and a unique expert-sourcing mechanism for obtaining human guidance. Also, included are a capability for data cleaning and transformations. Here we describe a new scalable architecture and extensions enabling DATA TAMER to integrate text with structured data.
doi:10.1109/icde.2014.6816755 dblp:conf/icde/GubanovSB14 fatcat:o3oywuzzt5ailc3rtddyg5togm