IBI-UPF at BARR-2017: Learning to Identify Abbreviations in Biomedical Literature System description

Francesco Ronzano, Laura Inés Furlong
2017 Annual Conference of the Spanish Society for Natural Language Processing  
This paper presents the participation of the IBI-UPF team to the Biomedical Abbreviation Recognition and Resolution (BARR) track organized in the context of the Evaluation of Human Language Technologies for Iberian Languages 2017 (IBEREVAL). The purpose of the track was to automatically identify abbreviation-definition pairs in the abstract of biomedical articles in Spanish. By releasing a sample corpus and two collections of training documents, the organizers provided a total of 1,150
more » ... of biomedical articles, the majority of them in Spanish, manually annotated with respect to the identifications of abbreviations and the corresponding definitions. We tackled the task by implementing an approach articulated in two sequential phases. In the first one, by relying on a set of shallow linguistic features extracted from the textual contents of biomedical abstracts, we trained two token classifiers to spot sequences of one or more tokens that respectively represent abbreviations or definitions. Then, a third classifier is trained to distinguish abbreviations that are candidate short forms of a definition expressed in the same abstract sentence from other types of abbreviations. In a second phase, relations between the abbreviations and definitions previously spotted are identified by means of a set of heuristics based on structural and linguistic traits of the text of each abstract. We evaluate the first phase of our approach by considering the set of Spanish biomedical abstracts manually annotated, provided by the organizers of the BARR track.
dblp:conf/sepln/RonzanoF17 fatcat:snjip3jrg5gmpkprge6gjar6eu