Iterative Residual Rescaling: An Analysis and Generalization of LSI [article]

Rie Kubota Ando, Lillian Lee
<span title="2001-06-17">2001</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's
more &raquo; ... 0) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="">arXiv:cs/0106039v1</a> <a target="_blank" rel="external noopener" href="">fatcat:ikldqvjgmrbalaw7jfgoowrnzi</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> File Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="" title=" access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> </button> </a>