Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents
Drug-drug interactions are frequently reported in the increasing amount of biomedical literature. Information Extraction (IE) techniques have been devised as a useful instrument to manage this knowledge. Nevertheless, IE at the sentence level has a limited effect because of the frequent references to previous entities in the discourse, a phenomenon known as 'anaphora'. DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological
... literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction. The system uses a set of linguistic rules drawn by Centering Theory over the analysis provided by a biomedical syntactic parser. Semantic information provided by the Unified Medical Language System (UMLS) is also integrated in order to improve the recognition and the resolution of nominal drug anaphors. Besides, a corpus has been developed in order to analyze the phenomena and evaluate the current approach. Each possible case of anaphoric expression was looked into to determine the most effective way of resolution. An F-score of 0.76 in anaphora resolution was achieved, outperforming significantly the baseline by almost 73%. This ad-hoc reference line was developed to check the results as there is no previous work on anaphora resolution in pharmacological documents. The obtained results resemble those found in related-semantic domains. The present approach shows very promising results in the challenge of accounting for anaphoric expressions in pharmacological texts. DrugNerAr obtains similar results to other approaches dealing with anaphora resolution in the biomedical domain, but, unlike these approaches, it focuses on documents reflecting drug interactions. The Centering Theory has proved being effective at the selection of antecedents in anaphora resolution. A key component in the success of this framework is the analysis provided by the MMTx program and the DrugNer system that allows to deal with the complexity of the pharmacological language. It is expected that the positive results of the resolver increases performance of our future drug-drug interaction extraction system.