Metric-Type Identification for Multilevel Header Numerical Tables in Scientific Papers

Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura
2021 Journal of Natural Language Processing  
Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. Herein, we introduce a new information extraction task, i.e., metric-type identification from multilevel header numerical tables, and provide a dataset extracted from scientific papers comprising header tables, captions, and metric-types. We propose joint-learning neural classification and generation schemes featuring
more » ... intergenerator-based and pretrained-based models. Our results show that the joint models can manage both in-header and out-of-header metric-type identification problems. Furthermore, transfer learning using fine-tuned pretrained-based models successfully improves the performance. The domain-specific of BERT-based model, SciBERT, achieves the best performance. Results achieved by a fine-tuned T5-based model are comparable to those obtained using our BERT-based model under a multitask setting.
doi:10.5715/jnlp.28.1247 fatcat:nh3uebxhgndqrddyjrhuojb3z4