Filters








3 Hits in 3.0 sec

Auctus: A Dataset Search Engine for Data Augmentation [article]

Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, Juliana Freire
2021 arXiv   pre-print
We demonstrate how the Auctus dataset search engine addresses some of these challenges. We describe the system architecture and how users can explore datasets through a rich set of queries.  ...  However, finding relevant data is difficult. While search engines have addressed this problem for Web documents, there are many new challenges involved in supporting the discovery of structured data.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF and DARPA.  ... 
arXiv:2102.05716v2 fatcat:juyw3tujmjcdffbni3z5yls4pm

ARDA: Automatic Relational Data Augmentation for Machine Learning [article]

Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger
2020 arXiv   pre-print
We present \system, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in  ...  Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes  ...  Real World Datasets Real World datasets are such that given a base table you search open sourced datasets for joinable tables using Join Discovery systems such as Aurum or NYU Auctus.  ... 
arXiv:2003.09758v1 fatcat:4glagrcvkzc67ege3u7hsccduq

Editorial: AI for Data Discovery and Reuse [article]

Huajin Wang, Keith Webster
2019
AIDR 2019 (Artificial Intelligence for Data Discovery and Reuse) is a new conference that brings together researchers across a broad range of disciplines, computer scientists, tool developers, data providers  ...  There is great value embedded in reusing scientifc data for secondary discoveries.  ...  Fernando Chirigati described Auctus, a dataset search engine that targets the problem of incomplete or insufficient data, finds datasets that can be joined or unioned from the web, and uses these datasets  ... 
doi:10.1184/r1/10093568 fatcat:luf2coraszhilps2qwuir753rm