Locality preserving indexing for document representation

Xiaofei He, Deng Cai, Haifeng Liu, Wei-Ying Ma
2004 Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04  
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI might not be optimal in discriminating documents with different semantics. In this paper, a novel algorithm
more » ... led Locality Preserving Indexing (LPI) is proposed for document indexing. Each document is represented by a vector with low dimensionality. In contrast to LSI which discovers the global structure of the document space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare the proposed LPI approach with LSI on two standard databases. Experimental results show that LPI provides better representation in the sense of semantic structure.
doi:10.1145/1008992.1009012 dblp:conf/sigir/HeCLM04 fatcat:bauei3gfarcbnogir2bso7p2cm