2,608 Hits in 5.7 sec

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks [article]

Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang
2022 arXiv   pre-print
To fully use the supervision signals in unlabeled tables, a variety of pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships,  ...  Since tables usually appear and interact with free-form text, table pre-training usually takes the form of table-text joint pre-training, which attracts significant research interests from multiple domains  ...  Input Featurization and Embedding Cell Text Encoding Most table pre-training methods tokenized cell text using WordPiece and learned token embeddings [Devlin et al., 2018] , such as TaBERT, TaPas, MATE  ... 
arXiv:2201.09745v4 fatcat:fckxlk6przhsthnyhozehw3dz4

Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach [article]

Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir
2022 arXiv   pre-print
Recognition (TSR) task and a Table Cell Type Classification (CTC) task.  ...  We use a graph to represent complex table structures for the TSR task.  ...  In order to apply a language model to the CTC task, the pre-trained language model is firstly used to generate the feature embeddings and fine-tune a cell type classification model.  ... 
arXiv:2208.06031v1 fatcat:i5io4yadh5df7m7s62exktdnga

Hybrid Metadata Classification in Large-scale Structured Datasets

Sophie Pavia, Nick Piraino, Kazi Islam, Anna Pyayt, Michael Gubanov
2022 Journal of Data Intelligence  
Metadata location and classification is an important problem for large-scale structured datasets.  ...  We observed superiority of this two-layer ensemble, compared to the recent previous approaches and report an impressive 95.73\text{\%} accuracy at scale with our ensemble model using regular LSTM.  ...  We did not use any pre-trained word embedding, rather we have trained our a keras-embedding layer using our vocabulary and datasets.  ... 
doi:10.26421/jdi3.4-4 fatcat:6ox544h7lbaozbkz4o526u2crq

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization [article]

Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang
2022 arXiv   pre-print
ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models.  ...  Intelligent analysis and visualization of tables use techniques to automatically recommend useful knowledge from data, thus freeing users from tedious multi-dimension data mining.  ...  We extract the field embeddings from the pre-trained tabular model after feeding the serialised tabular data 𝑇 𝑖 into the pre-trained model, where 𝑖 indicates the index of target field.  ... 
arXiv:2208.01043v2 fatcat:awi6kdpjwff2xoaq5sf2ietdoq

Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information [article]

Xinyi He, Mengyu Zhou, Jialiang Xu, Xiao Lv, Tianle Li, Yijia Shao, Shi Han, Zejian Yuan, Dongmei Zhang
2022 arXiv   pre-print
It outperforms a series of baselines that are based on rules, traditional machine learning methods, and pre-trained tabular models.  ...  To inference these metadata for a raw table, we propose our multi-tasking Metadata model which fuses field distribution and knowledge graph information into pre-trained tabular models.  ...  CONCLUSION In this paper, we propose the novel analysis metadata for tabular data analysis and collected a large corpus with supervision by using smart supervisions from downstream tasks, public datasets  ... 
arXiv:2209.00946v1 fatcat:uvb5gheo35ehffbp5gzzmwgtcm

A Hybrid Probabilistic Approach for Table Understanding

Kexuan Sun, Harsha Rayudu, Jay Pujara
2021 AAAI Conference on Artificial Intelligence  
The evaluation results show that our system achieves the state-of-the-art performance on cell type classification, block identification, and relationship prediction, improving over prior efforts by up  ...  Tables of data are used to record vast amounts of socioeconomic, scientific, and governmental information.  ...  Accordingly, in our system, we use their pre-trained cell embedding model learned from thousands of tables.  ... 
dblp:conf/aaai/0002RP21 fatcat:bzmjw25lmfbn3iohxpfpggp3di

TableCNN: deep learning framework for learning tabular data

Pranav Sankhe, Elham Khabiri, Bhavna Agrawal, Yingjie Li
2020 International Semantic Web Conference  
Cell embedding is generated using Word2Vec[4]; each row across the tokenized table is treated as a sentence for Word2Vec model learning.  ...  Databases and tabular data are among the most common and rapidly growing resources.  ...  The arbitrary alphanumeric nature of data prevents us from using pre-trained language models. Fig. 1 : 1 Fig. 1: TableCNN Architecture. Fig. 2 : 2 Fig. 2: Confusion Matrix.  ... 
dblp:conf/semweb/SankheKAL20 fatcat:2abzxkoobvdpvdtpbniuvkrafq

Numerical Tuple Extraction from Tables with Pre-training

Qingping Yang, Yixuan Cao, Ping Luo
2022 Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining  
To represent cells with their intricate correlations in tables, we propose a BERT-based pre-trained language model, TableLM, to encode tables with diverse layouts.  ...  As a form of relational data, numerical tuples have direct and transparent relationships between their elements and are therefore easy for machines to use.  ...  Pre-training TableLM We only pre-train the stacked Transformer that does not involve numerical cells, so they are not considered during pre-training.  ... 
doi:10.1145/3534678.3539460 fatcat:uif62dogkje6tozcufj6onnh5m

TFV: A Framework for Table-Based Fact Verification

Mingke Chai, Zihui Gu, Xiaoman Zhao, Ju Fan, Xiaoyong Du
2021 IEEE Data Engineering Bulletin  
To bridge the gap, in this paper, we introduce a framework TFV, including pre-trained language models, fine-tuning, intermediate pre-training and table serialization techniques.  ...  Moreover, we also develop a python package that implements TFV and illustrate how it is used for table-based fact verification.  ...  ., [CLS] and [SEP], and then we use a pre-trained LM, e.g., BERT [7] to obtain embeddings for all the tokens.  ... 
dblp:journals/debu/ChaiGZF021 fatcat:rvpm3jqwhnf4vcgk6nzg4updqy

BreakingBERT@IITK at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables [article]

Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, Ashutosh Modi
2021 arXiv   pre-print
Given a table and a statement/fact, subtask A determines whether the statement is inferred from the tabular data, and subtask B determines which cells in the table provide evidence for the former subtask  ...  In this paper, as part of the SemEval-2021 Task 9, we tackle the problem of fact verification and evidence finding over tabular data. There are two subtasks.  ...  TableBERT uses the pre-trained BERT (Devlin et al., 2018) model and fine-tunes it using the TabFact dataset as a simple NLI task by linearizing the table along with the fact.  ... 
arXiv:2104.03071v2 fatcat:ocyz2dlctfcsjfsnmigytdfuim

Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning [article]

Devansh Gautam, Kshitij Gupta, Manish Shrivastava
2021 arXiv   pre-print
Tables are widely used in various kinds of documents to present information concisely.  ...  Our systems achieve an F1 score of 67.34 in subtask A three-way classification, 72.89 in subtask A two-way classification, and 62.95 in subtask B.  ...  Our systems use TAPAS (Herzig et al., 2020) trained with intermediate pre-training for both the subtasks.  ... 
arXiv:2106.00248v2 fatcat:ok6ovg5gpjgjrorbomz3htfh24

Structure-aware Pre-training for Table Understanding with Tree-based Transformers [article]

Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, Dongmei Zhang
2020 arXiv   pre-print
TUTA pre-trains on a wide range of unlabeled tables and fine-tunes on a critical task in the field of table structure understanding, i.e. cell type classification.  ...  Upon this, we extend the pre-training architecture with two core mechanisms, namely the tree-based attention and tree-based position embedding.  ...  RNN + is a bidirectional LSTM-based method for cell classification using pre-trained cell and format embeddings.  ... 
arXiv:2010.12537v2 fatcat:kli75htw6rbprecahx5zavk544

Integrated multimodal artificial intelligence framework for healthcare applications [article]

Luis R. Soenksen, Yu Ma, Cynthia Zeng, Leonard D.J. Boussioux, Kimberly Villalobos Carballo, Liangyuan Na, Holly M. Wiberg, Michael L. Li, Ignacio Fuentes, Dimitris Bertsimas
2022 arXiv   pre-print
Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments.  ...  and 6,485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text and images), 11 unique data sources and 12 predictive tasks.  ...  Natural language inputs such as notes are processed using a pre-trained transformer neural network to generate text embeddings of fixed size (EText(n,t)).  ... 
arXiv:2202.12998v3 fatcat:wxnzohoi3fdurln62z56m6oxte

Integrating and querying similar tables from PDF documents using deep learning [article]

Rahul Anand, Hye-Young Paik, Cheng Wang
2019 arXiv   pre-print
We demonstrate that using word embedding trained on Google news for header match clearly outperforms the text-match based approach in traditional database.  ...  This is achieved through table type classification and nearest row search.  ...  Another method is using a pre-trained model from Google, the Google News word2vec model.  ... 
arXiv:1901.04672v1 fatcat:wdicg4ztljg4tfvvhot7pcvxmq

SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS) [article]

Nancy X. R. Wang, Diwakar Mahajan, Marina Danilevsk. Sara Rosenthal
2021 arXiv   pre-print
In this paper, we address this challenge by presenting a new dataset and tasks that addresses this goal in a shared task in SemEval 2020 Task 9: Fact Verification and Evidence Finding for Tabular Data  ...  Understanding tables is an important and relevant task that involves understanding table structure as well as being able to compare and contrast information within cells.  ...  Two of the most recent works are TAPAS (Herzig et al., 2020) and TaBERT (Yin et al., 2020) , which jointly pre-train over textual and tabular data to facilitate table QA.  ... 
arXiv:2105.13995v1 fatcat:ci4evv4bsnadxjp2j2imxcpbvy
« Previous Showing results 1 — 15 out of 2,608 results