Iterative Residual Rescaling: An Analysis and Generalization of LSI [article]

Rie Kubota Ando, Lillian Lee
<span title="2001-06-17">2001</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's
more &raquo; ... 0) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="">arXiv:cs/0106039v1</a> <a target="_blank" rel="external noopener" href="">fatcat:ikldqvjgmrbalaw7jfgoowrnzi</a> </span>
