Filters








13,951 Hits in 5.7 sec

Duplicate Elimination in Space-partitioning Tree Indexes

M. Y. Eltabakh, Mourad Ouzzani, Walid G. Aref
2007 International Conference on Scientific and Statistical Database Management  
In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning  ...  In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree  ...  Acknowledgments The work of Mourad Ouzzani was supported in part by a Lilly Endowment grant and a US DHS (PURVAC) grant.  ... 
doi:10.1109/ssdbm.2007.10 dblp:conf/ssdbm/EltabakhOA07 fatcat:ovkp2ffx7zhbzcg6ojgbhtxk5m

A Two-level Spatial In-Memory Index [article]

Dimitrios Tsitsigkos, Konstantinos Lampropoulos, Panagiotis Bouros, Nikos Mamoulis, Manolis Terrovitis
2021 arXiv   pre-print
of spatial indexes based on disjoint space partitioning.  ...  This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem  ...  Our index is based on a simple grid-based space partitioning. Grid-based indexing has several advantages over hierarchical indexes, such as the R-tree [14] .  ... 
arXiv:2005.08600v2 fatcat:crv6xmq3wbhxfp7vcygz7hq5wi

Efficient index lookup for De-duplication backup system

Youjip Won, Jongmyeong Ban, Jaehong Min, Jungpil Hur, Sangkyu Oh, Jangsun Lee
2008 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems  
With filter based in-memory index data structure and index partitioning, PRUNE eliminates 99.4% of disk accesses involved in fingerprint management.  ...  We minimizes fingerprint management overhead(index lookup and index insert) via introducing main memory index lookup structure and workload-aware index partitioning of the index file in the storage.  ...  Venti and SAN file system use fixed size blocks in partitioning a file. SIS detect duplicate data on the file level. LBFS [7] can reduce both network traffic and wasting storage space.  ... 
doi:10.1109/mascot.2008.4770594 fatcat:ijsxsne2h5gl3binzf6e4zwj3e

A Survey and Classification of Storage Deduplication Systems

João Paulo, José Pereira
2014 ACM Computing Surveys  
The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs.  ...  The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique  ...  Granularity Granularity refers to the method used for partitioning data into chunks, the basic unit for eliminating duplicates.  ... 
doi:10.1145/2611778 fatcat:kh76pmfu3nhlji4v5uyrfhgycu

Improving duplicate elimination in storage systems

Deepak R. Bobbarjung, Suresh Jagannathan, Cezary Dubnicki
2006 ACM Transactions on Storage  
In this paper, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects.  ...  For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.  ...  -The offset of the subchunk in its superchunk. -The size of the subchunk. The tree itself is indexed using the hash of the subchunk.  ... 
doi:10.1145/1210596.1210599 fatcat:qftpedzronh25n7eo7bflqfwli

Coalescing in Temporal Databases

Michael H. Böhlen, Richard T. Snodgrass, Michael D. Soo
1996 Very Large Data Bases Conference  
Coalescing is a unary operator applicable to temporal databases; it is similar to duplicate elimination in conventional databases.  ...  In this paper we show how semantically superfluous coalescing can be eliminated. We then turn to efficiently performing coalescing.  ...  Acknowledgments The second and third authors were supported in part by NSF grant ISI-and a grant from the AT&T Foundation.  ... 
dblp:conf/vldb/BohlenSS96 fatcat:2cq6pqwbmzddtla3rgn75asivu

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Feng Zhang, Jingwei Zhou, Renyi Liu, Zhenhong Du, Xinyue Ye
2016 Sustainability  
In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with  ...  SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition  ...  Renyi Liu was involved in data acquisition and revision of the manuscript.  ... 
doi:10.3390/su8090926 fatcat:nqmv42sfeja7xkem2wmz4lvjhu

An optimal and progressive algorithm for skyline queries

Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
elimination if d>2, multiple accesses of the same node, large space overhead).  ...  Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN.  ...  In general, for d>2, the overlapping of the partitions necessitates duplicate elimination. Kossmann et al.  ... 
doi:10.1145/872811.872814 fatcat:4osaj4mxmzek5frg65mkvbitty

Efficient Physical Organization of R-Trees Using Node Clustering

F.Sagayaraj Francis, P. Thadurambi
2007 Journal of Computer Science  
R-Tree is a multidimensional indexing structure that forms basis for all the multidimensional indexing structures based on data partitioning.  ...  Moreover, to preserve the structural and functional properties of R-Tree at any point in the process of clustering, this paper introduces a concept called 'controlled duplication'.  ...  An improvement in this front would enhance the performance of centralized and homogeneous databases.  ... 
doi:10.3844/jcssp.2007.506.514 fatcat:qty25twd4bgodbp6ufpygwod3m

An optimal and progressive algorithm for skyline queries

Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger
2003 Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03  
elimination if d>2, multiple accesses of the same node, large space overhead).  ...  Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN.  ...  In general, for d>2, the overlapping of the partitions necessitates duplicate elimination. Kossmann et al.  ... 
doi:10.1145/872757.872814 dblp:conf/sigmod/PapadiasTFS03 fatcat:pthvoe5tsbds3bwnewntcoq5ti

Size separation spatial join

Nick Koudas, Kenneth C. Sevcik
1997 Proceedings of the 1997 ACM SIGMOD international conference on Management of data - SIGMOD '97  
Size Separation Spatial Join (S3 J) imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets.  ...  We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them.  ...  the duplicate elimination in the case of PBSM, is exhausted, especially in environments with limited disk space.  ... 
doi:10.1145/253260.253340 dblp:conf/sigmod/KoudasS97 fatcat:digjphhffvhutilszov5heirui

Size separation spatial join

Nick Koudas, Kenneth C. Sevcik
1997 SIGMOD record  
Size Separation Spatial Join (S3 J) imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets.  ...  We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them.  ...  the duplicate elimination in the case of PBSM, is exhausted, especially in environments with limited disk space.  ... 
doi:10.1145/253262.253340 fatcat:phtyf5aah5gurijow5sl6zypk4

BE-tree

Mohammad Sadoghi, Hans-Arno Jacobsen
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
BE-Tree is a novel dynamic tree data structure designed to efficiently index Boolean expressions over a high-dimensional discrete space.  ...  We conduct a comprehensive evaluation to demonstrate the superiority of BE-Tree in comparison with state-of-the-art index structures designed for matching Boolean expressions.  ...  We, first, employ a de-duplication technique to eliminate duplicate entries and to convert the data into a set of q-grams.  ... 
doi:10.1145/1989323.1989390 dblp:conf/sigmod/SadoghiJ11 fatcat:djjgr2grbncrxnhmpajdqrtzde

Performance evaluation of algorithms for transitive closure

Robert Kabler, Yannis E Ioannidis, Michael J Carey
1992 Information Systems  
The algorithms were tested on several graphs, ranging from regular trees to random acyclic graphs to random general graphs.  ...  Finally, for the common case where a transitive closure query involves a selection, Seminaive can take advantage of the constants in the selection, whereas Blocked Warren and Smart cannot.  ...  Duplicate elimination is the key to the performance of all algorithms in non-tree graphs.  ... 
doi:10.1016/0306-4379(92)90035-l fatcat:sjzljzwpa5erlotbjlt4ym3vm4

Indexing Large Trajectory Data Sets With SETI

V. Prasad Chakka, Adam Everspaugh, Jignesh M. Patel
2003 Conference on Innovative Data Systems Research  
With the rapid increase in the use of inexpensive, location-aware sensors in a variety of new applications, large amounts of time-sequenced location data will soon be accumulated.  ...  Based on an actual implementation, we demonstrate that SETI clearly outperforms two previously proposed trajectory indexing mechanisms, namely the 3D R-tree and the TB-tree.  ...  In addition, we would like to thank the anonymous reviewers for their comments; in particular, one of the anonymous reviewers provided very valuable detailed comments and suggestions that has helped improve  ... 
dblp:conf/cidr/ChakkaEP03 fatcat:udsj2pbpijgftdsj6fdujtlnhy
« Previous Showing results 1 — 15 out of 13,951 results