Filters








159 Hits in 1.4 sec

Lazy exact deduplication

Jingwei Ma, Rebecca J. Stones, Yuxiang Ma, Jingui Wang, Junjie Ren, Gang Wang, Xiaoguang Liu
2016 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)  
For lazy deduplication, we design a buffering strategy that preserves locality in order to similarly facilitate prefetching.  ...  In this paper, we propose a "lazy" data deduplication method which buffers incoming fingerprints and performs on-disk lookups in batches, aiming to reduce the disk bottleneck.  ...  However, the lazy and eager methods perform exact deduplication, and consequently make on-disk lookups, so they are both slower than approximate methods.  ... 
doi:10.1109/msst.2016.7897081 dblp:conf/mss/MaSMWRWL16 fatcat:fsr3mduqbbbqxbg7zcpkyeenpe

Lazy Exact Deduplication

Jingwei Ma, Rebecca J. Stones, Yuxiang Ma, Jingui Wang, Junjie Ren, Gang Wang, Xiaoguang Liu
2017 ACM Transactions on Storage  
For lazy deduplication, we design a buffering strategy that preserves locality in order to similarly facilitate prefetching.  ...  In this paper, we propose a "lazy" data deduplication method which buffers incoming fingerprints and performs on-disk lookups in batches, aiming to reduce the disk bottleneck.  ...  However, the lazy and eager methods perform exact deduplication, and consequently make on-disk lookups, so they are both slower than approximate methods.  ... 
doi:10.1145/3078837 fatcat:us3cm32tpveo7asiklihkqtr3a

The Design and Implementation of a Rekeying-Aware Encrypted Deduplication Storage System

Chuan Qin, Jingwei Li, Patrick P. C. Lee
2017 ACM Transactions on Storage  
However, it is non-trivial to realize efficient rekeying in encrypted deduplication storage systems, which use deterministic content-derived encryption keys to allow deduplication on ciphertexts.  ...  We design and implement REED, a rekeying-aware encrypted deduplication storage system.  ...  Conclusion We present REED, an encrypted deduplication storage system that aims for secure and lightweight rekeying.  ... 
doi:10.1145/3032966 fatcat:4yuprudcmrcm7cnzv2p7jc2ose

Offline Selective Data Deduplication for Primary Storage Systems

Sejin PARK, Chanik PARK
2016 IEICE transactions on information and systems  
Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important.  ...  However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio.  ...  The deduplication policy is then selected by the rank of each file. For offline write operation support, we use a lazy update scheme that naturally solves the write latency.  ... 
doi:10.1587/transinf.2015edp7034 fatcat:n7wnasee3rf2zkadne7lauxria

QuERy

Hotham Altwaijry, Sharad Mehrotra, Dmitri V. Kalashnikov
2015 Proceedings of the VLDB Endowment  
Recall that, in this lazy solution, we place the cleaning operator above all polymorphic selections and joins and thus, an item will not reach the deduplicate operator unless it passes all predicates in  ...  However, QuERy deals with "exact" answers to SPJ queries based on cleaning only the necessary parts of data needed to answer the query.  ... 
doi:10.14778/2850583.2850587 fatcat:f6wx7gl37fboxa4ayiuvxvyd5q

A novel approach to data deduplication over the engineering-oriented cloud systems

Zhe Sun, Jun Shen, Jianming Yong
2013 Integrated Computer-Aided Engineering  
With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment.  ...  Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as backend.  ...  , an exact deduplication result is achieved.  ... 
doi:10.3233/ica-120418 fatcat:oirmipybzvgsvnpknhkbbqgcya

A Content Fingerprint-based Cluster-wide Inline Deduplication for Shared-Nothing Storage Systems

Awais Khan, Prince Hamandawana, Youngjae Kim
2020 IEEE Access  
Exact Deduplication [18] , DeDe [15] and Boafft [23] share high similarity to our proposed design.  ...  Exact Deduplication [18] , DeDe [15] and Boafft [23] share high similarity to our proposed design.  ... 
doi:10.1109/access.2020.3039056 fatcat:okz34jybhfdihkjsj236yzejra

Efficient Deduplication in a Distributed Primary Storage Infrastructure

João Paulo, José Pereira
2016 ACM Transactions on Storage  
Deduplication allows reclaiming these duplicates while improving the cost-effectiveness of large-scale multitenant infrastructures.  ...  Also, some of these systems reduce storage overhead by confining deduplication to off-peak periods that may be scarce in a cloud environment.  ...  DEDIS performs exact deduplication across all cluster nodes, i.e., all stored chunks are compared against each other, thus having optimal deduplication gain.  ... 
doi:10.1145/2876509 fatcat:rekpetayojdqxh7jwfdetdec7e

CernVM-FS powered container hub

Enrico Bocchi, Jakob Blomer, Simone Mosciatti, Andrea Valenzuela, C. Biscarat, S. Campana, B. Hegner, S. Roiser, C.I. Rovelli, G.A. Stewart
2021 EPJ Web of Conferences  
CVMFS ingestion is based on per-file deduplication, instead of the per-layer deduplication adopted by traditional container registries.  ...  Layers stored in CVMFS must be an exact copy of the layers provided by the container registry.  ...  In addition, computing nodes no longer need to download the full images locally but rather take advantage of the lazy-loading feature provided by CVMFS.  ... 
doi:10.1051/epjconf/202125102033 fatcat:qkeeafaktfgr7gn53zjfmsne4q

Blocking and Filtering Techniques for Entity Resolution

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 ACM Computing Surveys  
The experimental survey in [64] included a comparison between BayesLSH-lite and exact algorithms.  ...  lazy static Attribute Clustering Blocking [120] hash based redundancy positive lazy static RDFKeyLearner [154] hash based redundancy positive lazy static Prefix-Infix(-Suffix) Blocking [119] hash  ... 
doi:10.1145/3377455 fatcat:uuzuuxwwzrfg7cwfwzswdqvklm

QueryER: A Framework for Fast Analysis-Aware Deduplication over Dirty Data [article]

Giorgos Alexiou, George Papastefanatos, Vassilis Stamatopoulos, Georgia Koutrika, Nectarios Koziris
2022 arXiv   pre-print
QueryER executes analysis-aware deduplication by weaving ER operators into the query plan.  ...  It is offered in two variants: i) a lazy-solution that does not consider costs and, ii) an adaptive-solution that uses a cost-based planner.  ...  With QueryER, the user, instead of fully cleaning the data before issuing the query, will issue the exact same query directly on top of the dirty data, without the need of a pre-processing step (e.g.  ... 
arXiv:2202.01546v1 fatcat:7vh66c75gbcfji23s3yiykt3oq

An Investigation of Alternatives to Transform Protein Sequence Databases to a Columnar Index Schema

Roman Zoun, Kay Schallert, David Broneske, Ivayla Trifonova, Xiao Chen, Robert Heyer, Dirk Benndorf, Gunter Saake
2021 Algorithms  
[22] extends the trie with the property that each parent node with only one child is merged with this exact child. This optimization is also known from prefix trees [23] .  ...  Our Radix Tree Adaptation The difference from the original radix tree by De La Briandais [21] is our usage of the pessimistic path compression and lazy expansion of tree nodes that we adapted from Leis  ... 
doi:10.3390/a14020059 fatcat:xopdgbyrizerrm3ymza7zq43xu

LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge

Geyao Cheng, Deke Guo, Lailong Luo, Junxu Xia, Siyuan Gu
2021 IEEE Transactions on Parallel and Distributed Systems  
Various data deduplication technologies are currently employed at edge to eliminate redundant data chunks for space saving.  ...  Trace-driven experiments show that LOFS closely tracks the global deduplication ratio and generates a relatively low load std compared with the comparison methods.  ...  Literature [46] proposes a lazy data deduplication method.  ... 
doi:10.1109/tpds.2021.3133098 fatcat:lvgqyi2fdnfw3n7ymab2urhq2i

A Survey of Blocking and Filtering Techniques for Entity Resolution [article]

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 arXiv   pre-print
This definition refers to Deduplication, but can be easily extended to Record Linkage.  ...  The experimental survey in [68] included a comparison between BayesLSH-lite and exact algorithms.  ... 
arXiv:1905.06167v4 fatcat:zoodv75tazg23cfnq4dwfgt6ge

Scalable Blocking for Very Large Databases [article]

Andrew Borthwick, Stephen Ash, Bin Pang, Shehzad Qureshi, Timothy Jones
2020 arXiv   pre-print
In the field of database deduplication, the goal is to find approximately matching records within a database.  ...  We do this exact count and dedup in parallel in one mapreduce style operation.  ...  For clarity we show the pseudocode in an imperative style, but in the implementation everything is implemented as a sequence of lazy map and reduce operations.  ... 
arXiv:2008.08285v1 fatcat:d3n66vvy35bqjknuipus676fru
« Previous Showing results 1 — 15 out of 159 results