CRYSTAL: Inducing a Conceptual Dictionary [article]

Stephen Soderland, David Fisher, Jonathan Aseltine, Wendy Lehnert
1995 arXiv   pre-print
One of the central knowledge sources of an information extraction system is a dictionary of linguistic patterns that can be used to identify the conceptual content of a text. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries
more » ... over the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.
arXiv:cmp-lg/9505020v1 fatcat:3rzbbrvzcff2fdjw7ea62mv7mi