Mixed-initiative development of language processing systems

David Day, John Aberdeen, Lynette Hirschman, Robyn Kozierok, Patricia Robinson, Marc Vilain
1997 Proceedings of the fifth conference on Applied natural language processing -  
Historically, tailoring language processing systems to specific domains and languages for which they were not originally built has required a great deal of effort. Recent advances in corpus-based manual and automatic training methods have shown promise in reducing the time and cost of this porting process. These developments have focused even greater attention on the bottleneck of acquiring reliable, manually tagged training data. This paper describes a new set of integrated tools, collectively
more » ... called the Alembic Workbench, that uses a mixed-initiative approach to "bootstrapping" the manual tagging process, with the goal of reducing the overhead associated with corpus development. Initial empirical studies using the Alembic Workbench to annotate "named entities" demonstrates that this approach can approximately double the production rate. As an ~ benefit, the combined efforts of machine and user produce domainspecific annotation rules that can be used to annotate similar texts automatically through the Alembic NLP system. The ultimate goal of this project is to enable end users to generate a practical domain-specific information extraction system within a single session.
doi:10.3115/974557.974608 dblp:conf/anlp/DayAHKRV97 fatcat:zvhmz7d5ujd2zicejxr4ub64lq