1 Hit in 2.6 sec

PubTables-1M: Towards comprehensive table extraction from unstructured documents [article]

Brandon Smock and Rohith Pesala and Robin Abraham
2021 arXiv   pre-print
To address this, we develop a new, more comprehensive dataset for table extraction, called PubTables-1M.  ...  Recently, significant progress has been made applying machine learning to the problem of table structure inference and extraction from unstructured documents.  ...  PubTables-1M Dataset The source data for creating PubTables-1M are pairs of PDF and XML versions of the same document from the PMCOA dataset.  ... 
arXiv:2110.00061v3 fatcat:rmzg5kwk65gpvlpuj7bkplokwy