Automatic Extraction of Bibliographic Information from Biomedical Online Journal Articles Using a String Matching Algorithm

Jongwoo Kim, D.X. Le, G.R. Thoma
2006 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06)  
A system has been developed to extract bibliographic data (grant numbers and databank accession numbers) from online biomedical journal articles for the National Library of Medicine's MEDLINE database. Rule-based algorithms and a string matching algorithm are proposed to extract the bibliographic data from HTML-formatted articles. Experiments conducted with 411 medical articles from 73 journal issues show an accuracy exceeding 96%.
doi:10.1109/cbms.2006.55 dblp:conf/cbms/KimLT06 fatcat:m2niqt7z6rdqxhxtfdrrxnojgi