A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
OCR++: A Robust Framework For Information Extraction from Scholarly Articles
[article]
2016
arXiv
pre-print
This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in English language to understand generic writing patterns and formulate rules to develop this hybrid
arXiv:1609.06423v3
fatcat:5zdkmcwh5ndwhaakjphcwr2ohe