Filters








15,499 Hits in 8.0 sec

Mitos: Design and Evaluation of a DBMS-Based Web Search Engine

Panagiotis Papadakos, Yannis Theoharis, Yannis Marketakis, Nikos Armenatzoglou, Yannis Tzitzikas
2008 2008 Panhellenic Conference on Informatics  
This paper discusses the benefits and the drawbacks of this choice (compared to the classical inverted files), proposes three different database representations, and reports comparative experimental results  ...  Two of these representations are one order of magnitude more space efficient and two orders of magnitude faster in query evaluation, than the plain relational representation.  ...  In addition, to reduce the I/O overhead during query evaluation for P R we clustered the occurrence table on word id (clustering time is not included in Table 3 ).  ... 
doi:10.1109/pci.2008.46 dblp:conf/pci/PapadakosTMAT08 fatcat:6uurygdapzg3ddb4kp6j7neplq

Distributed media indexing based on MPI and MapReduce

Hisham Mohamed, Stephane Marchand-Maillet
2012 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)  
In contrast, the message passing interface (MPI) is suitable for high performance algorithms.  ...  MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms.  ...  Each process then starts to build its own inverted file data structure based on the global reference points and the partial data it has access to.  ... 
doi:10.1109/cbmi.2012.6269841 dblp:conf/cbmi/MohamedM12 fatcat:u2zslj5nazg4rnm23cqhrkdc6y

Efficient Update of Indexes for Dynamically Changing Web Documents

Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ramesh Agarwal
2007 World wide web (Bussum)  
Our method uses the idea of landmarks together with the diff algorithm to significantly reduce the number of postings in the inverted index that need to be updated.  ...  Our experiments verify that our landmark-diff method results in significant savings in the number of update operations on the inverted index.  ...  In the case where a block-based variant of diff such as that described in Section 4.2 is used, an extra access to the old file or new file is required to obtain the words that are deleted or inserted.  ... 
doi:10.1007/s11280-006-0009-2 fatcat:slq3lhs6vjd5fg3fkhwb3m3nbi

Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce

Jimmy Lin
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of "more like this" queries in the life sciences domain.  ...  In the distributed file system, data blocks are stored on the local disks of machines in the cluster-the MapReduce runtime attempts to schedule mappers on machines where the necessary data resides, thus  ...  In processing a set of queries, each postings list is accessed only once-each mapper computes partial score contributions for all queries that contain the term.  ... 
doi:10.1145/1571941.1571970 dblp:conf/sigir/Lin09 fatcat:wuq4db7ouvhjblrqcwovu7gkny

The Grid File: An Adaptable, Symmetric Multikey File Structure

J. Nievergelt, Hans Hinterberger, Kenneth C. Sevcik
1984 ACM Transactions on Database Systems  
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access.  ...  They manifest various deficiencies in particular for multikey access to highly dynamic files.  ...  Willinger for writing an early version of the simulation program, and to the following people for communicating to us their experiences about ongoing implementations of the grid file: K.  ... 
doi:10.1145/348.318586 fatcat:hzlg2b7ebjavjbyk2cxugnzdxi

Inverted files for text search engines

Justin Zobel, Alistair Moffat
2006 ACM Computing Surveys  
In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions.  ...  The technology underlying text search engines has advanced dramatically in the past decade.  ...  Jamie Callan, Bruce Croft, Donna Harman, Mike Lesk, and Ellen Voorhees helped us identify some of the early work in the area.  ... 
doi:10.1145/1132956.1132959 fatcat:u56re4tqtfg6zcpyfnzl5ne57m

The grid file: An adaptable, symmetric multi-key file structure [chapter]

J. Nievergelt, H. Hinterberger, K. C. Sevcik
1981 Lecture Notes in Computer Science  
Traditional file structures that provide multikey access to records, for example, inverted files, are extensions of file structures originally designed for single-key access.  ...  They manifest various deficiencies in particular for multikey access to highly dynamic files.  ...  Willinger for writing an early version of the simulation program, and to the following people for communicating to us their experiences about ongoing implementations of the grid file: K.  ... 
doi:10.1007/3-540-10885-8_45 fatcat:kj4n3qk6ofegheoaakiyto54bu

Full-text indexing for optimizing selection operations in large-scale data analytics

Jimmy Lin, Dmitriy Ryaboy, Kevin Weil
2011 Proceedings of the second international workshop on MapReduce and its applications - MapReduce '11  
The idea is simple and intuitive: the full-text index informs the Hadoop execution engine which compressed data blocks contain query terms of interest, and only those data blocks are decompressed and scanned  ...  Given the explosion of unstructured data begotten by social media and other web-based applications, we take the position that any modern analytics platform must support operations on free-text fields as  ...  In HDFS, file blocks (typically 64 or 128 MB in size) are stored on the local disks of machines in the cluster (with a default replication factor of three).  ... 
doi:10.1145/1996092.1996105 fatcat:q2fd2ijnrvezdn6b2iklrk4jui

Compressed inverted files with reduced decoding overheads

Anh Ngoc Vo, Alistair Moffat
1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98  
Compressed inverted files are the most compact way of indexing large text databases, typically occupying around 10% of the space of the collection they index.  ...  For ranked queries, the new mechanism reduces both CPU and elapsed time to one third and memory usage to less than one tenth of the standard algorithm, with no degradation in retrieval effectiveness.  ...  Acknowledgements This work was supported by the Australian Research Council and The Australian Agency for International Development.  ... 
doi:10.1145/290941.291011 dblp:conf/sigir/VoM98 fatcat:atvvr734kvbkro77w2f5uvp7f4

Incremental cluster-based retrieval using compressed cluster-skipping inverted files

Ismail Sengor Altingovde, Engin Demir, Fazli Can, Özgür Ulusoy
2008 ACM Transactions on Information Systems  
In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query  ...  The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.  ...  Our cluster-skipping inverted file proposed in this article is inspired by this former work, but extends it in various ways.  ... 
doi:10.1145/1361684.1361688 fatcat:3iuznbyiobdzpp3m4ih7qvquuu

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

Zheng Wei, Joseph JaJa
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
Keywords-indexer; inverted files; multicore; GPU; pipelined and parallel parsing and indexing I.  ...  The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.  ...  Sangchul Song who developed the version of Wikipedia04-09 dataset which was used in our experimental evaluation.  ... 
doi:10.1109/ipdps.2011.107 dblp:conf/ipps/WeiJ11 fatcat:kcdvnw56jnaetj33ysyob5m23i

A fast algorithm for constructing inverted files on heterogeneous platforms

Zheng Wei, Joseph JaJa
2012 Journal of Parallel and Distributed Computing  
Keywords-indexer; inverted files; multicore; GPU; pipelined and parallel parsing and indexing I.  ...  The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.  ...  Sangchul Song who developed the version of Wikipedia04-09 dataset which was used in our experimental evaluation.  ... 
doi:10.1016/j.jpdc.2012.02.005 fatcat:bxe6t4yxavgzrj5vuyfn4ocgxm

Efficient processing of joins on set-valued attributes

Nikos Mamoulis
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.  ...  We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates.  ...  Acknowledgements This work was supported by grant HKU 7380/02E from Hong Kong RGC.  ... 
doi:10.1145/872757.872778 dblp:conf/sigmod/Mamoulis03 fatcat:byykjdeghvenvkjteeipu7y7ei

Efficient processing of joins on set-valued attributes

Nikos Mamoulis
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.  ...  We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates.  ...  Acknowledgements This work was supported by grant HKU 7380/02E from Hong Kong RGC.  ... 
doi:10.1145/872773.872778 fatcat:acnzk2xmnjd4zduavozqyhwu5m

Reduction of Bus Transition for Compressed Code Systems

Malathi S.R, Ramya Asmi R
2013 International Journal of VLSI Design & Communication Systems  
The main focus here is to present a method for reducing the power consumption of compressed-code systems by inverting the bits that are transmitted on the bus.  ...  Low power VLSI circuit design is one of the most important issues in present day technology. One of the ways of reducing power is to reduce the number of transitions on the bus.  ...  In the same paper, they extended this approach to multi way partial bus-invert (MPBI), where highly correlated bus lines were clustered into multiple sub-buses and each of them was encoded independently  ... 
doi:10.5121/vlsic.2013.4110 fatcat:taegkjve3zcc5habmvqggsmpa4
« Previous Showing results 1 — 15 out of 15,499 results