Passage retrieval for incorporating global evidence in sequence labeling

Jeffrey Dalton, James Allan, David A. Smith
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
Many forms of linguistic analysis, such as part of speech tagging, named entity recognition, and other sequence labeling tasks are performed on short spans of text and assume statistical dependence within a window of only a few tokens. We propose using passage retrieval to induce non-local dependencies in structured classification that generalizes earlier work in context aggregation for namedentity recognition. We introduce a new method for feature expansion inspired by psuedo-relevance
more » ... (PRF). Our results on the CoNLL 2003 task show that features from cross-document feature expansion improves NER effectiveness over previous aggregation models. Utilizing all the tokens in a sentence for query context consistently perform best on both intrinsic and extrinsic evaluations. Tagging models incorporating feature expansion outperform the leading NER system when evaluated on out of domain data, a collection of publicly available scanned books on the topic of historic Deerfield, MA. Finally, the results show that retrieval based feature expansion using an external collection of unlabeled text can result in further effectiveness improvements.
doi:10.1145/2063576.2063633 dblp:conf/cikm/DaltonAS11 fatcat:wjmxnrbhtndt5ao75ahcefch6e