2 Hits in 2.0 sec

GitTables: A Large-Scale Corpus of Relational Tables [article]

Madelon Hulsebos, Çağatay Demiralp, Paul Groth
2021 arXiv   pre-print
Here we introduce GitTables, a corpus of currently 1.7M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 20M tables.  ...  The practical success of deep learning has sparked interest in improving relational table tasks, like data search, with models trained on large table corpora.  ...  GitTables, a new large-scale table corpus.  ... 
arXiv:2106.07258v3 fatcat:aga7v2pm7zhxfgs6mkmz7mvs3m

Making Table Understanding Work in Practice [article]

Madelon Hulsebos and Sneha Gathani and James Gale and Isil Dillig and Paul Groth and Çağatay Demiralp
2021 arXiv   pre-print
Understanding the semantics of tables at scale is crucial for tasks like data integration, preparation, and search.  ...  Table understanding methods aim at detecting a table's topic, semantic column types, column relations, or entities.  ...  Since SIGMATYPER is intended to operate on enterprise tables, we use the GitTables [18] corpus to train it.  ... 
arXiv:2109.05173v1 fatcat:huoeikenzbesbco3ymnd5pxcry