Augmenting Approximate Similarity Searching with Lexical Information

James Gorman, James R. Curran
2005 Australasian Language Technology Association Workshop  
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naïve nearest-neighbour approach to compare context vectors extracted from large corpora scales poorly. The Spatial Approximation Sample Hierarchy (SASH) is a data-structure for performing approximate nearest-neighbour queries, and has been previously used to improve the scalability of distributional similarity searches. We add lexical semantic
more » ... rmation from WordNet to the SASH in an attempt to improve the accuracy and efficiency of similarity searches.
dblp:conf/acl-alta/GormanC05 fatcat:dleexzcy5vaivfrwekibl5vjia