A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Duplicate Elimination in Space-partitioning Tree Indexes
2007
International Conference on Scientific and Statistical Database Management
In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning ...
In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree ...
Acknowledgments The work of Mourad Ouzzani was supported in part by a Lilly Endowment grant and a US DHS (PURVAC) grant. ...
doi:10.1109/ssdbm.2007.10
dblp:conf/ssdbm/EltabakhOA07
fatcat:ovkp2ffx7zhbzcg6ojgbhtxk5m
A Two-level Spatial In-Memory Index
[article]
2021
arXiv
pre-print
of spatial indexes based on disjoint space partitioning. ...
This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem ...
Our index is based on a simple grid-based space partitioning. Grid-based indexing has several advantages over hierarchical indexes, such as the R-tree [14] . ...
arXiv:2005.08600v2
fatcat:crv6xmq3wbhxfp7vcygz7hq5wi
Efficient index lookup for De-duplication backup system
2008
2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems
With filter based in-memory index data structure and index partitioning, PRUNE eliminates 99.4% of disk accesses involved in fingerprint management. ...
We minimizes fingerprint management overhead(index lookup and index insert) via introducing main memory index lookup structure and workload-aware index partitioning of the index file in the storage. ...
Venti and SAN file system use fixed size blocks in partitioning a file. SIS detect duplicate data on the file level. LBFS [7] can reduce both network traffic and wasting storage space. ...
doi:10.1109/mascot.2008.4770594
fatcat:ijsxsne2h5gl3binzf6e4zwj3e
A Survey and Classification of Storage Deduplication Systems
2014
ACM Computing Surveys
The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. ...
The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique ...
Granularity Granularity refers to the method used for partitioning data into chunks, the basic unit for eliminating duplicates. ...
doi:10.1145/2611778
fatcat:kh76pmfu3nhlji4v5uyrfhgycu
Improving duplicate elimination in storage systems
2006
ACM Transactions on Storage
In this paper, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. ...
For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems. ...
-The offset of the subchunk in its superchunk. -The size of the subchunk. The tree itself is indexed using the hash of the subchunk. ...
doi:10.1145/1210596.1210599
fatcat:qftpedzronh25n7eo7bflqfwli
Coalescing in Temporal Databases
1996
Very Large Data Bases Conference
Coalescing is a unary operator applicable to temporal databases; it is similar to duplicate elimination in conventional databases. ...
In this paper we show how semantically superfluous coalescing can be eliminated. We then turn to efficiently performing coalescing. ...
Acknowledgments The second and third authors were supported in part by NSF grant ISI-and a grant from the AT&T Foundation. ...
dblp:conf/vldb/BohlenSS96
fatcat:2cq6pqwbmzddtla3rgn75asivu
A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability
2016
Sustainability
In this paper, we present Spatial Join with Spark (SJS), a proposed high-performance algorithm, that uses a simple, but efficient, uniform spatial grid to partition datasets and joins the partitions with ...
SJS utilizes the distributed in-memory iterative computation of Spark, then introduces a calculation-evaluating model and in-memory spatial repartition technology, which optimize the initial partition ...
Renyi Liu was involved in data acquisition and revision of the manuscript. ...
doi:10.3390/su8090926
fatcat:nqmv42sfeja7xkem2wmz4lvjhu
An optimal and progressive algorithm for skyline queries
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
elimination if d>2, multiple accesses of the same node, large space overhead). ...
Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN. ...
In general, for d>2, the overlapping of the partitions necessitates duplicate elimination. Kossmann et al. ...
doi:10.1145/872811.872814
fatcat:4osaj4mxmzek5frg65mkvbitty
Efficient Physical Organization of R-Trees Using Node Clustering
2007
Journal of Computer Science
R-Tree is a multidimensional indexing structure that forms basis for all the multidimensional indexing structures based on data partitioning. ...
Moreover, to preserve the structural and functional properties of R-Tree at any point in the process of clustering, this paper introduces a concept called 'controlled duplication'. ...
An improvement in this front would enhance the performance of centralized and homogeneous databases. ...
doi:10.3844/jcssp.2007.506.514
fatcat:qty25twd4bgodbp6ufpygwod3m
An optimal and progressive algorithm for skyline queries
2003
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03
elimination if d>2, multiple accesses of the same node, large space overhead). ...
Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN. ...
In general, for d>2, the overlapping of the partitions necessitates duplicate elimination. Kossmann et al. ...
doi:10.1145/872757.872814
dblp:conf/sigmod/PapadiasTFS03
fatcat:pthvoe5tsbds3bwnewntcoq5ti
Size separation spatial join
1997
Proceedings of the 1997 ACM SIGMOD international conference on Management of data - SIGMOD '97
Size Separation Spatial Join (S3 J) imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets. ...
We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them. ...
the duplicate elimination in the case of PBSM, is exhausted, especially in environments with limited disk space. ...
doi:10.1145/253260.253340
dblp:conf/sigmod/KoudasS97
fatcat:digjphhffvhutilszov5heirui
Size separation spatial join
1997
SIGMOD record
Size Separation Spatial Join (S3 J) imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets. ...
We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them. ...
the duplicate elimination in the case of PBSM, is exhausted, especially in environments with limited disk space. ...
doi:10.1145/253262.253340
fatcat:phtyf5aah5gurijow5sl6zypk4
BE-Tree is a novel dynamic tree data structure designed to efficiently index Boolean expressions over a high-dimensional discrete space. ...
We conduct a comprehensive evaluation to demonstrate the superiority of BE-Tree in comparison with state-of-the-art index structures designed for matching Boolean expressions. ...
We, first, employ a de-duplication technique to eliminate duplicate entries and to convert the data into a set of q-grams. ...
doi:10.1145/1989323.1989390
dblp:conf/sigmod/SadoghiJ11
fatcat:djjgr2grbncrxnhmpajdqrtzde
Performance evaluation of algorithms for transitive closure
1992
Information Systems
The algorithms were tested on several graphs, ranging from regular trees to random acyclic graphs to random general graphs. ...
Finally, for the common case where a transitive closure query involves a selection, Seminaive can take advantage of the constants in the selection, whereas Blocked Warren and Smart cannot. ...
Duplicate elimination is the key to the performance of all algorithms in non-tree graphs. ...
doi:10.1016/0306-4379(92)90035-l
fatcat:sjzljzwpa5erlotbjlt4ym3vm4
Indexing Large Trajectory Data Sets With SETI
2003
Conference on Innovative Data Systems Research
With the rapid increase in the use of inexpensive, location-aware sensors in a variety of new applications, large amounts of time-sequenced location data will soon be accumulated. ...
Based on an actual implementation, we demonstrate that SETI clearly outperforms two previously proposed trajectory indexing mechanisms, namely the 3D R-tree and the TB-tree. ...
In addition, we would like to thank the anonymous reviewers for their comments; in particular, one of the anonymous reviewers provided very valuable detailed comments and suggestions that has helped improve ...
dblp:conf/cidr/ChakkaEP03
fatcat:udsj2pbpijgftdsj6fdujtlnhy
« Previous
Showing results 1 — 15 out of 13,951 results