A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
Filters
Mitos: Design and Evaluation of a DBMS-Based Web Search Engine
2008
2008 Panhellenic Conference on Informatics
This paper discusses the benefits and the drawbacks of this choice (compared to the classical inverted files), proposes three different database representations, and reports comparative experimental results ...
Two of these representations are one order of magnitude more space efficient and two orders of magnitude faster in query evaluation, than the plain relational representation. ...
In addition, to reduce the I/O overhead during query evaluation for P R we clustered the occurrence table on word id (clustering time is not included in Table 3 ). ...
doi:10.1109/pci.2008.46
dblp:conf/pci/PapadakosTMAT08
fatcat:6uurygdapzg3ddb4kp6j7neplq
Distributed media indexing based on MPI and MapReduce
2012
2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)
In contrast, the message passing interface (MPI) is suitable for high performance algorithms. ...
MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. ...
Each process then starts to build its own inverted file data structure based on the global reference points and the partial data it has access to. ...
doi:10.1109/cbmi.2012.6269841
dblp:conf/cbmi/MohamedM12
fatcat:u2zslj5nazg4rnm23cqhrkdc6y
Efficient Update of Indexes for Dynamically Changing Web Documents
2007
World wide web (Bussum)
Our method uses the idea of landmarks together with the diff algorithm to significantly reduce the number of postings in the inverted index that need to be updated. ...
Our experiments verify that our landmark-diff method results in significant savings in the number of update operations on the inverted index. ...
In the case where a block-based variant of diff such as that described in Section 4.2 is used, an extra access to the old file or new file is required to obtain the words that are deleted or inserted. ...
doi:10.1007/s11280-006-0009-2
fatcat:slq3lhs6vjd5fg3fkhwb3m3nbi
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce
2009
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of "more like this" queries in the life sciences domain. ...
In the distributed file system, data blocks are stored on the local disks of machines in the cluster-the MapReduce runtime attempts to schedule mappers on machines where the necessary data resides, thus ...
In processing a set of queries, each postings list is accessed only once-each mapper computes partial score contributions for all queries that contain the term. ...
doi:10.1145/1571941.1571970
dblp:conf/sigir/Lin09
fatcat:wuq4db7ouvhjblrqcwovu7gkny
The Grid File: An Adaptable, Symmetric Multikey File Structure
1984
ACM Transactions on Database Systems
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. ...
They manifest various deficiencies in particular for multikey access to highly dynamic files. ...
Willinger for writing an early version of the simulation program, and to the following people for communicating to us their experiences about ongoing implementations of the grid file: K. ...
doi:10.1145/348.318586
fatcat:hzlg2b7ebjavjbyk2cxugnzdxi
Inverted files for text search engines
2006
ACM Computing Surveys
In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. ...
The technology underlying text search engines has advanced dramatically in the past decade. ...
Jamie Callan, Bruce Croft, Donna Harman, Mike Lesk, and Ellen Voorhees helped us identify some of the early work in the area. ...
doi:10.1145/1132956.1132959
fatcat:u56re4tqtfg6zcpyfnzl5ne57m
The grid file: An adaptable, symmetric multi-key file structure
[chapter]
1981
Lecture Notes in Computer Science
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access. ...
They manifest various deficiencies in particular for multikey access to highly dynamic files. ...
Willinger for writing an early version of the simulation program, and to the following people for communicating to us their experiences about ongoing implementations of the grid file: K. ...
doi:10.1007/3-540-10885-8_45
fatcat:kj4n3qk6ofegheoaakiyto54bu
Full-text indexing for optimizing selection operations in large-scale data analytics
2011
Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11
The idea is simple and intuitive: the full-text index informs the Hadoop execution engine which compressed data blocks contain query terms of interest, and only those data blocks are decompressed and scanned ...
Given the explosion of unstructured data begotten by social media and other web-based applications, we take the position that any modern analytics platform must support operations on free-text fields as ...
In HDFS, file blocks (typically 64 or 128 MB in size) are stored on the local disks of machines in the cluster (with a default replication factor of three). ...
doi:10.1145/1996092.1996105
fatcat:q2fd2ijnrvezdn6b2iklrk4jui
Compressed inverted files with reduced decoding overheads
1998
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98
Compressed inverted files are the most compact way of indexing large text databases, typically occupying around 10% of the space of the collection they index. ...
For ranked queries, the new mechanism reduces both CPU and elapsed time to one third and memory usage to less than one tenth of the standard algorithm, with no degradation in retrieval effectiveness. ...
Acknowledgements This work was supported by the Australian Research Council and The Australian Agency for International Development. ...
doi:10.1145/290941.291011
dblp:conf/sigir/VoM98
fatcat:atvvr734kvbkro77w2f5uvp7f4
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
2008
ACM Transactions on Information Systems
In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query ...
The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size. ...
Our cluster-skipping inverted file proposed in this article is inspired by this former work, but extends it in various ways. ...
doi:10.1145/1361684.1361688
fatcat:3iuznbyiobdzpp3m4ih7qvquuu
A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms
2011
2011 IEEE International Parallel & Distributed Processing Symposium
Keywords-indexer; inverted files; multicore; GPU; pipelined and parallel parsing and indexing I. ...
The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters. ...
Sangchul Song who developed the version of Wikipedia04-09 dataset which was used in our experimental evaluation. ...
doi:10.1109/ipdps.2011.107
dblp:conf/ipps/WeiJ11
fatcat:kcdvnw56jnaetj33ysyob5m23i
A fast algorithm for constructing inverted files on heterogeneous platforms
2012
Journal of Parallel and Distributed Computing
Keywords-indexer; inverted files; multicore; GPU; pipelined and parallel parsing and indexing I. ...
The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters. ...
Sangchul Song who developed the version of Wikipedia04-09 dataset which was used in our experimental evaluation. ...
doi:10.1016/j.jpdc.2012.02.005
fatcat:bxe6t4yxavgzrj5vuyfn4ocgxm
Efficient processing of joins on set-valued attributes
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates. ...
We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates. ...
Acknowledgements This work was supported by grant HKU 7380/02E from Hong Kong RGC. ...
doi:10.1145/872757.872778
dblp:conf/sigmod/Mamoulis03
fatcat:byykjdeghvenvkjteeipu7y7ei
Efficient processing of joins on set-valued attributes
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates. ...
We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates. ...
Acknowledgements This work was supported by grant HKU 7380/02E from Hong Kong RGC. ...
doi:10.1145/872773.872778
fatcat:acnzk2xmnjd4zduavozqyhwu5m
Reduction of Bus Transition for Compressed Code Systems
2013
International Journal of VLSI Design & Communication Systems
The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus. ...
Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus. ...
In the same paper, they extended this approach to multi way partial bus-invert (MPBI), where highly correlated bus lines were clustered into multiple sub-buses and each of them was encoded independently ...
doi:10.5121/vlsic.2013.4110
fatcat:taegkjve3zcc5habmvqggsmpa4
« Previous
Showing results 1 — 15 out of 15,499 results