3,812 Hits in 8.0 sec

Application of bitmap index to information retrieval

Kengo Fujioka, Yukio Uematsu, Makoto Onizuka
2008 Proceeding of the 17th international conference on World Wide Web - WWW '08  
To solve this problem, we devised the HS-bitmap index, which is hierarchically comprised of compressed data of summary bits.  ...  A summary bit in an upper matrix is obtained by logical OR of the n bits in its corresponding lower matrix. Let n denote the summary unit.  ...  Figure 1 . 1 Document-term matrix Figure 2 . 2 Hierarchical structure of summary bits Figure 4 . 4 Retrieval time vs. query length B-tree Table term term summary bit vector raw bit vector  ... 
doi:10.1145/1367497.1367680 dblp:conf/www/FujiokaUO08 fatcat:prs6gkqan5ailppmdu4xq7qt7e

Parameterised compression for sparse bitmaps

Alistair Moffat, Justin Zobel
1992 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '92  
Full-text retrieval systems often use ei-  ...  Improved hierarchical bit-vector compression in document retrieval systems.  ...  Klein, IEEE Data Compression Conference, Snow- Compression of concordances in full-text re- trieval systems.  ... 
doi:10.1145/133160.133210 dblp:conf/sigir/MoffatZ92 fatcat:qwd2cw2o7bfhrchg4dbp6klmoy

Construction of optimal graphs for bit-vector compression

A. Bookstein, S. T. Klein
1990 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '90  
Bitmaps are data structures occurring oilen in information retrieval. They are useful; they are abo large and expensive to store.  ...  We propose a preprocessing stage, in which bitmaps are first clustered and the clusters used to transform their member bitmaps into sparser ones, that can be more effectively compressed.  ...  In this paper, we shall describe and examine the possibilities of compressing bitmaps, a data structure often proposed for improving the performance of retrieval systems ([5] , [16] ).  ... 
doi:10.1145/96749.98236 dblp:conf/sigir/BooksteinK90 fatcat:3hm5rxc5lbay7cfsa4o7gm3hf4

A File Index for Document Storage and Retrieval Utilizing Descriptor Fragments

F. W. Allen
1982 Computer journal  
Document indexing by character substrings or word fragments found in document texts, titles, author names and keywords has been reported.  ...  File searches for partial match queries are relegated to a priori calculation of file subset addresses in which the desired documents have high probabilities of being located.  ...  into a bit vector for each document.  ... 
doi:10.1093/comjnl/25.1.2 fatcat:ed54keekpzc6hnbq4jjofm3xdm

Improved index compression techniques for versioned document collections

Jinru He, Junyuan Zeng, Torsten Suel
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
Current Information Retrieval systems use inverted index structures for efficient query processing.  ...  We propose new index compression techniques for versioned document collections that achieve reductions in index size over previous methods.  ...  the data with bit or byte boundaries in order to allow retrieval of particular bit vectors, but this is harder to model.  ... 
doi:10.1145/1871437.1871594 dblp:conf/cikm/HeZS10 fatcat:y235jx3vhvhhdctoinczlxuu2i

Parallel computing in information retrieval – an updated review

A. Macfarlane, S.E. Robertson, J.A. Mccann
1997 Journal of Documentation  
The DAP is successfully used by the DapText system described by Reddaway [11] and is included in the case studies section (7.1) below. Reuters use this system for their Text Retrieval purposes.  ...  In particular we stress the importance of the motivation in using parallel computing for Text Retrieval.  ...  We are grateful to Ephraim Vishniac and Dennis Parkinson for information on various aspects of the CM-2 and DAP systems described in this paper.  ... 
doi:10.1108/eum0000000007201 fatcat:2zuwtehixbd6xk33hwb3j43nse

Text Clustering in Distributed Networks with Enhanced File Security

Teena Susan Chandy
2014 IOSR Journal of Computer Engineering  
PCP2P alone does not provide any security for the system. It only ensures efficient information retrieval of text data by assigning document to most relevant cluster.  ...  It provides improved scalability by using a probabilistic approach for assigning documents to clusters. Only most relevant clusters are considered for comparison with each document.  ...  Introduction Text clustering is an established technique that is widely employed in most networks for improving the quality of information retrieval.  ... 
doi:10.9790/0661-16582430 fatcat:wp5vigciebhwfi7egftsawxuq4

Fast top-k similarity queries via matrix compression

Yucheng Low, Alice X. Zheng
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
The task is related to the classical k-Nearest-Neighbor problem, and is widely applicable in a number of domains such as information retrieval, online advertising and collaborative filtering.  ...  We conduct extensive experiments on the publicly available Wikipedia dataset, and demonstrate that, with a memory overhead of 21%, our method can provide 1-3 orders of magnitude improvement in query run-time  ...  At query time, the query word/vector v is similarly hierarchically compressed using the CompressVector function in Alg. 3. Then the top-K algorithm in Alg. 4 is called.  ... 
doi:10.1145/2396761.2398574 dblp:conf/cikm/LowZ12 fatcat:3uxmzb5ubrandkcaysdliocroe

The Feasibility of Brute Force Scans for Real-Time Tweet Search

Yulu Wang, Jimmy Lin
2015 Proceedings of the 2015 International Conference on Theory of Information Retrieval - ICTIR '15  
The real-time search problem requires making ingested documents immediately searchable, which presents architectural challenges for systems built around inverted indexing.  ...  In this paper, we explore a radical proposition: What if we abandon document inversion and instead adopt an architecture based on brute force scans of document representations?  ...  SYSTEM DESIGN Document Representations We begin with a dictionary that provides a mapping from terms to 32-bit integer term ids.  ... 
doi:10.1145/2808194.2809489 dblp:conf/ictir/WangL15 fatcat:dtyqpyqczbe5toh53cmlbfpf4y

A New Algorithm for Semantic XML Compression Using Multilevel Fuzzy Clustering Algorithm

Saad M. Darwish, Magda M. Madbouly, Sundws M. Mohammed
2015 International Journal of Web Science and Engineeringfor Smart Devices  
This paper focuses on improvement of semantic compression of XML documents based on clustering and rearranging of XML elements within documents.  ...  The proposed semantic-based lossless XML compression system employs multilevel fuzzy clustering as an effective method of clustering feature vectors encoding XML nodes on the different structure levels  ...  The objective of reordered Ctree vector compression is to reduce the number of bits needed to represent vector while making as few as possible alteration to it.  ... 
doi:10.21742/ijwsesd.2015.2.1.01 fatcat:rbbi5hml7ndcbce7rlon7edhzy

Semantic indexing in structured peer-to-peer networks

Ronaldo A Ferreira, Mehmet Koyutürk, Suresh Jagannathan, Ananth Grama
2008 Journal of Parallel and Distributed Computing  
retrieval systems [31], among others.  ...  Therefore, development of techniques to efficiently compute basis vectors for data distributed across peers is important for large-scale deployment of semantic indexing in P2P systems.  ...  In PSEARCH [31] , a P2P information retrieval system based on a semantic overlay network, documents and queries are represented as vectors in a multi-dimensional Cartesian space.  ... 
doi:10.1016/j.jpdc.2007.06.003 fatcat:wy7v4i763vg3vmyabezkudhxl4

Document Compression Improvements Based on Data Clustering [chapter]

Jiri Dvorsky, Jan Martinovic, Jan Platos, Vaclav Snasel
2010 Web Intelligence and Intelligent Agents  
Even saving e.g. one bit in data structures for searching and the improvement of text compression ratio in an IRS by one percent result in savings of tens of megabytes.  ...  Compression Improvements Based on Data Clustering In following text we study hierarchal methods only.www.intechopen.comDocument Compression Improvements Based on Data Clustering  ...  /books/web-intelligence-and-intelligentagents/document-compression-improvements-based-on-data-clustering  ... 
doi:10.5772/8365 fatcat:p7tpioijljhnhjc42b2ook4mxu

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance [article]

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma
2021 arXiv   pre-print
However, although existing vector compression methods including PQ can help improve the efficiency of DR, they incur severely decayed retrieval performance due to the separation between encoding and compression  ...  To overcome these problems, vector compression methods have been adopted in many practical embedding-based retrieval applications. One of the most popular methods is Product Quantization (PQ).  ...  Although existing vector compression methods can help improve efficiency for DR models, they suffer from a few drawbacks in practical scenarios.  ... 
arXiv:2108.00644v2 fatcat:gotq6xtjfrbxbc6aeo7laoxfe4

Compression of concordances in full-text retrieval systems

Y. Choueka, A. S. Fraenkel, S. T. Klein
1988 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '88  
A similar idea was used in [5] for encoding run-lengths in the compression of sparse bit-vectors.  ...  Table 4 : 4 Compression results Method POM length average improvement of the coordinate compression over header length % POM in bit (in byte) % 2 4.196 30.1 - (i) tt 7 3.166  ...  -61L the tcbtal number of words in the text. Thus Llogz NJ + 1 bits would bc necessary per coordinate.  ... 
doi:10.1145/62437.62500 dblp:conf/sigir/ChouekaFK88 fatcat:4zt5iy4vmrestelf4h4lnkpir4

FIRE: fractal indexing with robust extensions for image databases

R. Distasi, M. Nappi, M. Tucci
2003 IEEE Transactions on Image Processing  
Additionally, the experimental results show the effectiveness of FIRE in terms of both compression and retrieval accuracy. Index Terms-Content-based retrieval, invariance, iterated functions systems.  ...  As already documented in the literature, fractal image encoding is a family of techniques that achieves a good compromise between compression and perceived quality by exploiting the self-similarities present  ...  The feature vectors have been quantized to bits for each angle, yielding 64 classes. No significant improvement has been achieved by experimenting with finer quantization for the feature vectors.  ... 
doi:10.1109/tip.2003.811041 pmid:18237916 fatcat:64ekysqpbjeb3dafofzrxcb6ny
« Previous Showing results 1 — 15 out of 3,812 results