Filters








4,228 Hits in 4.6 sec

Benchmarking Multimodal AutoML for Tabular Data with Text Fields [article]

Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola
2021 arXiv   pre-print
Compared with human data science teams, the fully automated methodology that performed best on our benchmark (stack ensembling a multimodal Transformer with various tree models) also manages to rank 1st  ...  We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well.  ...  On these datasets, modeling the tabular features brings clear improvements over the text alone given the performance of Text-Net (our best text Transformer model that ignores tabular features) is only:  ... 
arXiv:2111.02705v1 fatcat:kvnyjxgkqbdbpedbgat433v5uu

Tabular Data Anomaly Patterns

Dina Sukhobok, Nikolay Nikolov, Dumitru Roman
2017 2017 International Conference on Big Data Innovations and Applications (Innovate-Data)  
This work is partly funded by the EC H2020 projects proDataMarket (Grant number: 644497), euBusinessGraph (Grant number: 732003), and EW-Shopp (Grant number: 732590).  ...  The proposed set of data anomalies is classified based on the scope of a data anomaly in a tabular dataset. Tabular datasets are composed of rows and columns [9] .  ...  We aim to provide an insight into basic data anomalies independent of the data domain, data acquisition technique, or the purpose of data cleaning.  ... 
doi:10.1109/innovate-data.2017.10 dblp:conf/obd/SukhobokNR17 fatcat:fn66dx47xjbpdahnqpvrzdrtfm

Semi-Automated Formalization and Representation of Engineering Knowledge Extracted from Spreadsheet Data

Aleksandr Yu. Yurin, Nikita O. Dorodnykh, Alexey O. Shigarov
2021 IEEE Access  
via the extraction and aggregation of conceptual model fragments from canonicalized tables, (III) model-driven synthesizing knowledge base and source codes from a domain model.  ...  Our case study on the industrial safety inspection (ISI) demonstrates the applicability of the approach for prototyping knowledge bases containing decision-making rules.  ...  These approaches focus on automated extracting knowledge from tabular data and, as a rule, they focus on a specific structure (model) of the table.  ... 
doi:10.1109/access.2021.3130172 fatcat:e4vfqkfm6reeloggkb5xlwzr7a

Developing integrated crop knowledge networks to advance candidate gene discovery

Keywan Hassani-Pak, Martin Castellote, Maria Esch, Matthew Hindle, Artem Lysenko, Jan Taubert, Christopher Rawlings
2016 Applied and Translational Genomics  
We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley.  ...  the value of integrated data in biological knowledge discovery.  ...  Acknowledgements We acknowledge all the past members of the Ondex team and collaborators who contributed to the development of the Ondex software (see http://www.ondex.org/people.shtml).  ... 
doi:10.1016/j.atg.2016.10.003 pmid:28018846 pmcid:PMC5167366 fatcat:hwtjgjbmsrhoxlisyryp2ppy2i

Evaluation of Representation Models for Text Classification with AutoML Tools [article]

Sebastian Brändle, Marc Hanussek, Matthias Blohm, Maximilien Kintz
2021 arXiv   pre-print
Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years.  ...  Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes.  ...  The text representations for the AutoML programs were generated using a transformer tool based on the BERT model.  ... 
arXiv:2106.12798v2 fatcat:trghh26xjzgzlinm3i3x73w2k4

Discovering Fair Representations in the Data Domain [article]

Novi Quadrianto, Viktoriia Sharmanska, Oliver Thomas
2019 arXiv   pre-print
On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions.  ...  When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions.  ...  Acknowledgments NQ is supported by the UK EPSRC project EP/P03442X/1 and the Russian Academic Excellence Project' 5-100'. VS is supported by the Imperial College Research Fellowship.  ... 
arXiv:1810.06755v2 fatcat:nbttkxm4hnekvjy46gt7jfwzfa

Data2Services: enabling automated conversion of data to services

Vincent Emonet, Alexander Malic, Amrapali Zaveri, Andreea Grigoriu, Michel Dumontier
2018 Figshare  
While data are becoming increasingly easy to find and access on the Web, significant effort and skill is still required to process the amount and diversity of data into convenient formats that are friendly  ...  The data can be loaded in a number of databases and are made accessible through native and autogenerated APIs.  ...  Transform RDF to target model Finally, SPARQL insert are run to transform generic RDF representation of the XML data structure into the target data model.  ... 
doi:10.6084/m9.figshare.7345868.v1 fatcat:2fhhjnk2fjfophz3k6zbm6crty

Answer-Aware Question Generation from Tabular and Textual Data using T5

Saichandra Pandraju, Sakthi Ganesh Mahalingam
2021 International Journal of Emerging Technologies in Learning (iJET)  
In this paper, we propose a single model architecture for question generation from tables along with text using "Text-to-Text Transfer Transformer" (T5) - a fully end-to-end model which does not rely on  ...  We also present our systematic approach in modifying the ToTTo dataset, release the augmented dataset as TabQGen along with the scores achieved using T5 as a baseline to aid further research.  ...  Table 2 shows the performance of the T5 models on the hold-out test dataset of TabQGen.  ... 
doi:10.3991/ijet.v16i18.25121 fatcat:t772vvgakndlhkrlvu7if2dkv4

TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction [article]

Xiawei Guo, Yuhan Quan, Huan Zhao, Quanming Yao, Yong Li, Weiwei Tu
2021 arXiv   pre-print
Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance compared to the tabular solution AutoFE  ...  Tabular data prediction (TDP) is one of the most popular industrial applications, and various methods have been designed to improve the prediction performance.  ...  The former ones design automated methods to generate cross product of features, while the latter ones design complex ways to model feature interactions in different orders.  ... 
arXiv:2108.09127v1 fatcat:7zojv4oayvfehhtcxiycvwmzfq

ModelWizard: Toward Interactive Model Construction [article]

Dylan Hutchison
2016 arXiv   pre-print
Data scientists engage in model construction to discover machine learning models that well explain a dataset, in terms of predictiveness, understandability and generalization across domains.  ...  We prototype our envisioned framework in ModelWizard, a domain-specific language embedded in F# to construct Tabular models.  ...  These operations include both transformations on the dataset, like the conversion of string gender values to links in a new table, and construction of a Tabular model.  ... 
arXiv:1604.04639v1 fatcat:vrxmp5i6gnbw5lhg2gqupo5td4

Discovering Fair Representations in the Data Domain

Novi Quadrianto, Viktoriia Sharmanska, Oliver Thomas
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions.  ...  When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions.  ...  Acknowledgments NQ is supported by the UK EPSRC project EP/P03442X/1 and the Russian Academic Excellence Project' 5-100'. VS is supported by the Imperial College Research Fellowship.  ... 
doi:10.1109/cvpr.2019.00842 dblp:conf/cvpr/QuadriantoST19 fatcat:ykc6zanhqzdljfofspk2glvuuy

Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial Examples [article]

Yael Mathov, Eden Levy, Ziv Katzir, Asaf Shabtai, Yuval Elovici
2021 arXiv   pre-print
The data in these domains is typically homogeneous, whereas heterogeneous tabular datasets domains remain underexplored despite their prevalence.  ...  We demonstrate the effectiveness of our approach using three datasets from different content domains.  ...  We believe that our work has identified an important research direction in the field of adversarial learning and broadens its scope beyond the main applications of computer vision.  ... 
arXiv:2010.03180v2 fatcat:om7o5emfibcwpox5gv2tkwgnxi

Reusable Transformations Of Data Cube Vocabulary Datasets From The Fiscal Domain

Jindřich Mynarz, Jakub Klímek, Marek Dudáš, Petr Škoda, Christiane Engels, Fathoni A Musyaffa, Vojtěch Svátek
2016 Zenodo  
The applicability of these transformations is shown on concrete use cases serving the goals of the OBEU project.  ...  Shared data models provide leverage for reusable data transformations. Common modelling patterns and data structures can make data transformations applicable to diverse datasets.  ...  Acknowledgements: The presented research has been supported by the H2020 project no. 645833 (OpenBudgets.eu).  ... 
doi:10.5281/zenodo.168589 fatcat:2jwydae3gnel3owhwq23u5fpwe

Towards Machine Learning Interpretability for Tabular Data with Mixed Data Types

Prativa Pokhrel, Alina Lazar
2022 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
In this work, we train multiple GB models using several tabular datasets and compare the result in terms of speed, performance, and the global and local models' interpretability.  ...  Gradient Boosting (GB) algorithms have been proposed for a variety of automated predictions and classification tasks with applications in many domains.  ...  The problem has shifted from collecting large amounts of data to mine and understand them, transforming them into knowledge, decisions, and actions.  ... 
doi:10.32473/flairs.v35i.130611 fatcat:kjsqgljy3ngp3i5onbu72vxhcu

Augmented Data Science: Towards Industrialization and Democratization of Data Science [article]

Huseyin Uzunalioglu, Jin Cao, Chitra Phadke, Gerald Lehmann, Ahmet Akyamac, Ran He, Jeongran Lee, Maria Able
2019 arXiv   pre-print
ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process.  ...  Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights.  ...  We also would like to thank the anonymous reviewers for their constructive feedback on an earlier version of this paper.  ... 
arXiv:1909.05682v1 fatcat:gpbwzdmjpzcjpokwarhxuuhd7y
« Previous Showing results 1 — 15 out of 4,228 results