Unknown word sense detection as outlier detection

Katrin Erk
2006 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -   unpublished
We address the problem of unknown word sense detection: the identification of corpus occurrences that are not covered by a given sense inventory. We model this as an instance of outlier detection, using a simple nearest neighbor-based approach to measuring the resemblance of a new item to a training set. In combination with a method that alleviates data sparseness by sharing training data across lemmas, the approach achieves a precision of 0.77 and recall of 0.82.
doi:10.3115/1220835.1220852 fatcat:sqtqiven4fhrfamffa6qpflpki