52 Hits in 6.3 sec

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

D. Bhagwat, K. Eshghi, D.D.E. Long, M. Lillibridge
2009 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems  
We present Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time  ...  Extreme Binning exploits file similarity instead of locality, and makes only one disk access for chunk lookup per file, which gives reasonable throughput.  ...  We now discuss how Extreme Binning can be used to parallelize file backup to build a scalable distributed system. IV.  ... 
doi:10.1109/mascot.2009.5366623 dblp:conf/mascots/BhagwatELL09 fatcat:xqumrxpbszgpzf4isykro3s4gm

A Scalable Inline Cluster Deduplication Framework for Big Data Protection [chapter]

Yinjin Fu, Hong Jiang, Nong Xiao
2012 Lecture Notes in Computer Science  
Governed by a similarity-based stateful data routing scheme, Σ-Dedupe assigns similar data to the same backup server at the super-chunk granularity using a handprinting technique to maintain high cluster-deduplication  ...  However, it remains a great challenge for cluster deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio in  ...  Extreme Binning [8] is a file-similarity based cluster deduplication scheme.  ... 
doi:10.1007/978-3-642-35170-9_18 fatcat:cvbunwuawjgf7l52nwfm3jvv4u

Similarity and Locality Based Indexing for High Performance Data Deduplication

Wen Xia, Hong Jiang, Dan Feng, Yu Hua
2015 IEEE transactions on computers  
SiLo also employs a locality based stateless routing algorithm to parallelize and distribute data blocks to multiple backup nodes.  ...  One of the main challenges for centralized data deduplication is the scalability of fingerprint-index search.  ...  A well-known similarity-based approach is Extreme Binning [10] that improves deduplication scalability by exploiting the file similarity to achieve a single on-disk index access for chunk lookup per  ... 
doi:10.1109/tc.2014.2308181 fatcat:szqge3jt5zhsnnnn7yhntj64j4

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Rui Zhu, Lei-hua Qin, Jing-li Zhou, Huan Zheng
2013 Journal of Central South University  
Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency.  ...  The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency.  ...  All the files with the same represent fingerprint are deduplicated by an independent in-disk index called "bin". Thus, one disk "bin" access can deduplicate all the chunks of the same file.  ... 
doi:10.1007/s11771-013-1650-4 fatcat:keao5c5mxnecbgw26tts7qny6q


Hema S
2018 International Journal of Advanced Research in Computer Science  
Bhagwat et al [13] introduced a new technique called Extreme Binning for Scalable and parallel deduplication, which is Suitable for non-traditional backup workloads.  ...  Extreme Binning system avoids chunk lookup access for each chunk in a file and it performs only one disk access for entire file.  ... 
doi:10.26483/ijarcs.v9i2.5318 fatcat:utj524jui5fnpnk2nwlwcmtesi


S. Supriya
2017 International Journal of Advanced Research in Computer Science  
Cloud computing provides scalable, low-cost and location-independent services over the internet. The services provided ranges from simple backup services to cloud storage infrastructures.  ...  The fast growth of data volumes has greatly increased the demand for techniques for saving disk space and network bandwidth.  ...  Extreme Binning is a scalable and parallel deduplication system for chunk based file backup. It uses file similarity rather than locality thus increasing the throughput of the system.  ... 
doi:10.26483/ijarcs.v8i8.4689 fatcat:mjdgs7dsfvf5dio3dhrdzotrou

Data Deduplication Techniques for Big Data Storage Systems

The following paper reviews the deduplication process, types of deduplication and techniques available for data deduplication.  ...  data, especially the data in unstructured format has brought a tremendous challenge on data analysis as well as the data storage systems which are essentially increasing the cost and performance of the backup  ...  RELATED WORK Guohua Wang, [6] adopted a clustering architecture based on Bloom Filter with multiple nodes where chunk level deduplication is done in parallel for all the nodes.  ... 
doi:10.35940/ijitee.j9129.0881019 fatcat:54gj5sbx6ncohbc6whody2l64m

A Comprehensive Study of the Past, Present, and Future of Data Deduplication

Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, Yukun Zhou
2016 Proceedings of the IEEE  
Finally, we outline the open problems and future research directions facing deduplication-based storage systems.  ...  The summary and taxonomy of the state of the art on deduplication help identify and understand the most important design considerations for data deduplication systems.  ...  Zadok for valuable discussions about deduplicated storage literature.  ... 
doi:10.1109/jproc.2016.2571298 fatcat:krfdbgm5pjemnmaswml7k4uv4e

Fault-Tolerant Dynamic Deduplication for Utility Computing

Waraporn Leesakul, Paul Townend, Peter Garraghan, Jie Xu
2014 2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing  
We propose a real-time adaptive deduplication system for Cloud and Utility computing that monitors in real-time for changing system, user, and environmental behaviour in order to fulfill a balance between  ...  Data deduplication is a promising technique for drastically reducing the amount of data stored in such system systems; however, current approaches are static in nature, using an amount of redundancy fixed  ...  There are other works on deduplication storages whose architectures are designed for the scalability issue, for example; Extreme Binning [23] , and Droplet [24] .  ... 
doi:10.1109/isorc.2014.55 dblp:conf/isorc/LeesakulTGX14 fatcat:zp4z67ejvnerpciptsibpci7g4

A novel approach to data deduplication over the engineering-oriented cloud systems

Zhe Sun, Jun Shen, Jianming Yong
2013 Integrated Computer-Aided Engineering  
With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment.  ...  With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment.  ...  Extreme Binning [3] is a scalable, paralleled deduplication approach aiming at a non-traditional backup workload which is composed of low-locality individual files.  ... 
doi:10.3233/ica-120418 fatcat:oirmipybzvgsvnpknhkbbqgcya

Design of an exact data deduplication cluster

Jurgen Kaiser, Dirk Meister, Andre Brinkmann, Sascha Effert
2012 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)  
Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity  ...  We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution.  ...  ACKNOWLEDGMENTS The authors would like to thank the German Federal Ministry of Economics and Technology (BMWi) for partly funding this work in the project SIMBA.  ... 
doi:10.1109/msst.2012.6232380 dblp:conf/mss/KaiserMBE12 fatcat:tn3bklssfzg4dfvmnwr63o3egm

A Survey and Classification of Storage Deduplication Systems

João Paulo, José Pereira
2014 ACM Computing Surveys  
This classification identifies and describes the different approaches used for each of them.  ...  Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory.  ...  ACKNOWLEDGMENTS We would like to thank the anonymous reviewers for their extensive comments and suggestions that helped us improve this article.  ... 
doi:10.1145/2611778 fatcat:kh76pmfu3nhlji4v5uyrfhgycu

Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function

Ahmed Sardar M. Saeed, Loay E. George
2021 Symmetry  
Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes.  ...  Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement  ...  [13] proposed Extreme Binning as a scalable and parallel deduplication method that takes advantage of file similarity rather than locality and uses only one disk access for each file for chunk lookup  ... 
doi:10.3390/sym13111978 fatcat:m4ohqfy7ijgctdagyrapjem67q

Decentralized and Privacy Sensitive Data De-Duplication Framework for Convenient Big Data Management in Cloud Backup Systems

J. Gnana Jeslin, P. Mohan Kumar
2022 Symmetry  
Thus, the deduplication of files and the auditing of credibility are extremely necessary and how they are achieved safely and effectively must be addressed in academic and commercial contexts urgently.  ...  node with the storage node specificity by means of a hand printing-based network model to attain adequate global deduplication performance.  ...  Extreme binning, by using file semblance, is an estimated distributed deduction strategy.  ... 
doi:10.3390/sym14071392 fatcat:qpivhfig7zao5htw4xvyt2224q

Data Deduplication and Load Balancing Techniques on Cloud Systems

Prof Pokale M.S., Surabhi Dhok, Vaishnavi Kasbe, Gauri Joshi, Noopur Shinde
2017 IJARCCE  
In this paper, we propose the architecture of deduplication system for cloud storage environment and give the process of avoiding duplication at the file-level and chunk-level on the client side.  ...  In the storage nodes (Snodes), DelayDedupe, a delayed target deduplication scheme based on the chunk-level deduplication and the access frequency of chunks, are proposed to reduce the response time.  ...  ACKNOWLEDGMENT We would like to thank the reviewers for their detailed comments, suggestions and constant support throughout the reviewing process that helped us to significantly improve the quality of  ... 
doi:10.17148/ijarcce.2017.63205 fatcat:l6y5nah4lbgr5aihndmqt6owfy
« Previous Showing results 1 — 15 out of 52 results