Approaches to verb subcategorization for biomedicine

Thomas Lippincott, Laura Rimell, Karin Verspoor, Anna Korhonen
2013 Journal of Biomedical Informatics  
Information about verb subcategorization frames (SCFs) is important to many tasks in natural language processing (NLP) and, in turn, text mining. Biomedicine has a need for high-quality SCF lexicons to support the extraction of information from the biomedical literature, which helps biologists to take advantage of the latest biomedical knowledge despite the overwhelming growth of that literature. Unfortunately, techniques for creating such resources for biomedical text are relatively
more » ... compared to general language. This paper serves as an introduction to subcategorization and existing approaches to acquisition, and provides motivation for developing techniques that address issues particularly important to biomedical NLP. First, we give the traditional linguistic definition of subcategorization, along with several related concepts. Second, we describe approaches to learning SCF lexicons from large data sets for general and biomedical domains. Third, we consider the crucial issue of linguistic variation between biomedical fields (subdomain variation). We demonstrate significant variation among subdomains, and find the variation does not simply follow patterns of general lexical variation. Finally, we note several requirements for future research in biomedical SCF lexicon acquisition: a high-quality gold standard, investigation of different definitions of subcategorization, and minimally-supervised methods that can learn subdomain-specific lexical usage without the need for extensive manual work. j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / y j b i n acquisition, with examples from general and biomedical language. The second goal is to determine the degree of variation in SCF behavior within biomedicine, which could have major implications for the success of the approach. Background In this section we present a basic introduction to verb subcategorization, which will be required as background for the rest of the paper. We then describe the typical interpretation of subcategorization in biomedical text, and how subcategorization information can improve NLP and text mining applications in biomedicine.
doi:10.1016/j.jbi.2012.12.001 pmid:23276747 fatcat:5x2cd375f5gf5hi6jjdk5hkiwa