A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Benchmarking Multimodal AutoML for Tabular Data with Text Fields
[article]
2021
arXiv
pre-print
Compared with human data science teams, the fully automated methodology that performed best on our benchmark (stack ensembling a multimodal Transformer with various tree models) also manages to rank 1st ...
We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. ...
On these datasets, modeling the tabular features brings clear improvements over the text alone given the performance of Text-Net (our best text Transformer model that ignores tabular features) is only: ...
arXiv:2111.02705v1
fatcat:kvnyjxgkqbdbpedbgat433v5uu
Tabular Data Anomaly Patterns
2017
2017 International Conference on Big Data Innovations and Applications (Innovate-Data)
This work is partly funded by the EC H2020 projects proDataMarket (Grant number: 644497), euBusinessGraph (Grant number: 732003), and EW-Shopp (Grant number: 732590). ...
The proposed set of data anomalies is classified based on the scope of a data anomaly in a tabular dataset. Tabular datasets are composed of rows and columns [9] . ...
We aim to provide an insight into basic data anomalies independent of the data domain, data acquisition technique, or the purpose of data cleaning. ...
doi:10.1109/innovate-data.2017.10
dblp:conf/obd/SukhobokNR17
fatcat:fn66dx47xjbpdahnqpvrzdrtfm
Semi-Automated Formalization and Representation of Engineering Knowledge Extracted from Spreadsheet Data
2021
IEEE Access
via the extraction and aggregation of conceptual model fragments from canonicalized tables, (III) model-driven synthesizing knowledge base and source codes from a domain model. ...
Our case study on the industrial safety inspection (ISI) demonstrates the applicability of the approach for prototyping knowledge bases containing decision-making rules. ...
These approaches focus on automated extracting knowledge from tabular data and, as a rule, they focus on a specific structure (model) of the table. ...
doi:10.1109/access.2021.3130172
fatcat:e4vfqkfm6reeloggkb5xlwzr7a
Developing integrated crop knowledge networks to advance candidate gene discovery
2016
Applied and Translational Genomics
We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. ...
the value of integrated data in biological knowledge discovery. ...
Acknowledgements We acknowledge all the past members of the Ondex team and collaborators who contributed to the development of the Ondex software (see http://www.ondex.org/people.shtml). ...
doi:10.1016/j.atg.2016.10.003
pmid:28018846
pmcid:PMC5167366
fatcat:hwtjgjbmsrhoxlisyryp2ppy2i
Evaluation of Representation Models for Text Classification with AutoML Tools
[article]
2021
arXiv
pre-print
Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. ...
Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. ...
The text representations for the AutoML programs were generated using a transformer tool based on the BERT model. ...
arXiv:2106.12798v2
fatcat:trghh26xjzgzlinm3i3x73w2k4
Discovering Fair Representations in the Data Domain
[article]
2019
arXiv
pre-print
On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. ...
When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. ...
Acknowledgments NQ is supported by the UK EPSRC project EP/P03442X/1 and the Russian Academic Excellence Project' 5-100'. VS is supported by the Imperial College Research Fellowship. ...
arXiv:1810.06755v2
fatcat:nbttkxm4hnekvjy46gt7jfwzfa
Data2Services: enabling automated conversion of data to services
2018
Figshare
While data are becoming increasingly easy to find and access on the Web, significant effort and skill is still required to process the amount and diversity of data into convenient formats that are friendly ...
The data can be loaded in a number of databases and are made accessible through native and autogenerated APIs. ...
Transform RDF to target model Finally, SPARQL insert are run to transform generic RDF representation of the XML data structure into the target data model. ...
doi:10.6084/m9.figshare.7345868.v1
fatcat:2fhhjnk2fjfophz3k6zbm6crty
Answer-Aware Question Generation from Tabular and Textual Data using T5
2021
International Journal of Emerging Technologies in Learning (iJET)
In this paper, we propose a single model architecture for question generation from tables along with text using "Text-to-Text Transfer Transformer" (T5) - a fully end-to-end model which does not rely on ...
We also present our systematic approach in modifying the ToTTo dataset, release the augmented dataset as TabQGen along with the scores achieved using T5 as a baseline to aid further research. ...
Table 2 shows the performance of the T5 models on the hold-out test dataset of TabQGen. ...
doi:10.3991/ijet.v16i18.25121
fatcat:t772vvgakndlhkrlvu7if2dkv4
TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction
[article]
2021
arXiv
pre-print
Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance compared to the tabular solution AutoFE ...
Tabular data prediction (TDP) is one of the most popular industrial applications, and various methods have been designed to improve the prediction performance. ...
The former ones design automated methods to generate cross product of features, while the latter ones design complex ways to model feature interactions in different orders. ...
arXiv:2108.09127v1
fatcat:7zojv4oayvfehhtcxiycvwmzfq
ModelWizard: Toward Interactive Model Construction
[article]
2016
arXiv
pre-print
Data scientists engage in model construction to discover machine learning models that well explain a dataset, in terms of predictiveness, understandability and generalization across domains. ...
We prototype our envisioned framework in ModelWizard, a domain-specific language embedded in F# to construct Tabular models. ...
These operations include both transformations on the dataset, like the conversion of string gender values to links in a new table, and construction of a Tabular model. ...
arXiv:1604.04639v1
fatcat:vrxmp5i6gnbw5lhg2gqupo5td4
Discovering Fair Representations in the Data Domain
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. ...
When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. ...
Acknowledgments NQ is supported by the UK EPSRC project EP/P03442X/1 and the Russian Academic Excellence Project' 5-100'. VS is supported by the Imperial College Research Fellowship. ...
doi:10.1109/cvpr.2019.00842
dblp:conf/cvpr/QuadriantoST19
fatcat:ykc6zanhqzdljfofspk2glvuuy
Not All Datasets Are Born Equal: On Heterogeneous Data and Adversarial Examples
[article]
2021
arXiv
pre-print
The data in these domains is typically homogeneous, whereas heterogeneous tabular datasets domains remain underexplored despite their prevalence. ...
We demonstrate the effectiveness of our approach using three datasets from different content domains. ...
We believe that our work has identified an important research direction in the field of adversarial learning and broadens its scope beyond the main applications of computer vision. ...
arXiv:2010.03180v2
fatcat:om7o5emfibcwpox5gv2tkwgnxi
Reusable Transformations Of Data Cube Vocabulary Datasets From The Fiscal Domain
2016
Zenodo
The applicability of these transformations is shown on concrete use cases serving the goals of the OBEU project. ...
Shared data models provide leverage for reusable data transformations. Common modelling patterns and data structures can make data transformations applicable to diverse datasets. ...
Acknowledgements: The presented research has been supported by the H2020 project no. 645833 (OpenBudgets.eu). ...
doi:10.5281/zenodo.168589
fatcat:2jwydae3gnel3owhwq23u5fpwe
Towards Machine Learning Interpretability for Tabular Data with Mixed Data Types
2022
Proceedings of the ... International Florida Artificial Intelligence Research Society Conference
In this work, we train multiple GB models using several tabular datasets and compare the result in terms of speed, performance, and the global and local models' interpretability. ...
Gradient Boosting (GB) algorithms have been proposed for a variety of automated predictions and classification tasks with applications in many domains. ...
The problem has shifted from collecting large amounts of data to mine and understand them, transforming them into knowledge, decisions, and actions. ...
doi:10.32473/flairs.v35i.130611
fatcat:kjsqgljy3ngp3i5onbu72vxhcu
Augmented Data Science: Towards Industrialization and Democratization of Data Science
[article]
2019
arXiv
pre-print
ADS is a data-driven approach and relies on statistics and ML to extract insights from any data set in a domain-agnostic way to facilitate the data science process. ...
Key features of ADS are the replacement of rudimentary data exploration and processing steps with automation and the augmentation of data scientist judgment with automatically-generated insights. ...
We also would like to thank the anonymous reviewers for their constructive feedback on an earlier version of this paper. ...
arXiv:1909.05682v1
fatcat:gpbwzdmjpzcjpokwarhxuuhd7y
« Previous
Showing results 1 — 15 out of 4,228 results