Dataxformer: Leveraging the Web for Semantic Transformations

Ziawasch Abedjan, John Morcos, Michael N. Gubanov, Ihab F. Ilyas, Michael Stonebraker, Paolo Papotti, Mourad Ouzzani
2015 Conference on Innovative Data Systems Research  
Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely
more » ... olved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.
dblp:conf/cidr/AbedjanMGISPO15 fatcat:kd6wxg56ufd5pfdkjtwlb3ncau