50 Hits in 8.3 sec

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezis, Peter Camble
2009 USENIX Conference on File and Storage Technologies  
We present sparse indexing, a technique that uses sampling and exploits the inherent locality within backup streams to solve for large-scale backup (e.g., hundreds of terabytes) the chunk-lookup disk bottleneck  ...  To identify similar segments, we use sampling and a sparse index.  ...  Acknowledgments We would like to thank Graham Perry, John Czerkowicz, David Falkinder, and Kevin Collins for their help and support.  ... 
dblp:conf/fast/LillibridgeEBDTC09 fatcat:7yzj2baicjh3lbhg2yv3wfnf3a

Similarity and Locality Based Indexing for High Performance Data Deduplication

Wen Xia, Hong Jiang, Dan Feng, Yu Hua
2015 IEEE transactions on computers  
The main idea behind SiLo is to expose and exploit more similarity by grouping strongly correlated small files into a segment and segmenting large files, and to leverage the locality in the data stream  ...  One of the main challenges for centralized data deduplication is the scalability of fingerprint-index search.  ...  Sparse Indexing [9] improves this method by sampling index instead of using Bloom filters in face of data sets with little or no locality, which reduces more than half of the RAM usage for indexing than  ... 
doi:10.1109/tc.2014.2308181 fatcat:szqge3jt5zhsnnnn7yhntj64j4

A Survey and Classification of Storage Deduplication Systems

João Paulo, José Pereira
2014 ACM Computing Surveys  
The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique  ...  This classification identifies and describes the different approaches used for each of them.  ...  ACKNOWLEDGMENTS We would like to thank the anonymous reviewers for their extensive comments and suggestions that helped us improve this article.  ... 
doi:10.1145/2611778 fatcat:kh76pmfu3nhlji4v5uyrfhgycu


Hema S
2018 International Journal of Advanced Research in Computer Science  
Lillibridge, et al presented a technique known as Sparse indexing that applied sampling and locality concepts for large-scale backup storage.  ...  Segment creation, the key factor of stream deduplication, is used in this technique. sampling and sparse indexing is also used in addition to identify similar segments.  ... 
doi:10.26483/ijarcs.v9i2.5318 fatcat:utj524jui5fnpnk2nwlwcmtesi

Extreme Binning: Scalable, parallel deduplication for chunk-based file backup

D. Bhagwat, K. Eshghi, D.D.E. Long, M. Lillibridge
2009 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems  
Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput.  ...  Data deduplication is an essential and critical component of backup systems.  ...  Unless some form of locality or similarity is exploited, inline, chunk-based deduplication, when done at a large scale faces what has been termed the disk bottleneck problem: to facilitate fast chunk ID  ... 
doi:10.1109/mascot.2009.5366623 dblp:conf/mascots/BhagwatELL09 fatcat:xqumrxpbszgpzf4isykro3s4gm

Improving restore speed for backup systems that use inline chunk-based deduplication

Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat
2013 USENIX Conference on File and Storage Technologies  
Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching  ...  Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime  ...  Acknowledgments We would like to thank our shepherd, Fred Douglis, and the anonymous referees for their many useful suggestions.  ... 
dblp:conf/fast/LillibridgeEB13 fatcat:bmpvcqyo7je3bp2yn6svzhbmlu

Framework of Data Deduplication: A Survey

A. Venish, K. Siva Sankar
2015 Indian Journal of Science and Technology  
Common algorithms used for this process are MD5 or SHA-1. Also content aware logic, which considers the content type of the data and finalize the size of block and boundaries.  ...  As the deduplication system processes data, it compares the data to the already identified blocks and stores in its database.  ...  Second to reduce the number of fingerprints used in the comparison Lillibriged et al. 11 created Sparse Index that contains sampling chunks and reference of the chunks.  ... 
doi:10.17485/ijst/2015/v8i26/80754 fatcat:nia6j4p6xzcwnfmx57yoova7yi

A study on data deduplication in HPC storage systems

Dirk Meister, Jurgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, Julian Kunkel
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
With deduplication, this data replication is localized and redundancy is removed -by storing data just once, all files that use identical regions refer to the same unique data.  ...  The most common approach splits file data into chunks and calculates a cryptographic fingerprint for each chunk.  ...  In the future, we may take inline deduplication (and compression) very seriously to reduce IO. • Large chunks: The overhead of deduplication can be reduced by using large chunks, but there is a trade-off  ... 
doi:10.1109/sc.2012.14 dblp:conf/sc/MeisterKBCKK12 fatcat:4sy3uj22ffh4rbvymihpizqmfm

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, Yujuan Tan
2015 USENIX Conference on File and Storage Technologies USENIX Association 13th USENIX Conference on File and Storage Technologies (FAST '15) 331  ...  Acknowledgments We are grateful to our shepherd Fred Douglis and the anonymous reviewers for their insightful feedback.  ...  Fingerprint Index The fingerprint index is a well-recognized performance bottleneck in large-scale deduplication systems [39] . The simplest fingerprint index is only a key-value store [33] .  ... 
dblp:conf/fast/FuFHHCXZT15 fatcat:jvrlqllxfnej7of3qxcpliurua

A Comprehensive Study of the Past, Present, and Future of Data Deduplication

Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, Yukun Zhou
2016 Proceedings of the IEEE  
| Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems due to the explosive growth of digital data.  ...  more computationally efficient than the traditional compression approaches in large-scale storage systems.  ...  Zadok for valuable discussions about deduplicated storage literature.  ... 
doi:10.1109/jproc.2016.2571298 fatcat:krfdbgm5pjemnmaswml7k4uv4e

DBLK: Deduplication for primary block storage

Yoshihiro Tsuchiya, Takashi Watanabe
2011 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST)  
The deduplication block-device (DBLK) is a deduplication and compression system with a block device interface. It is used as a primary storage and block-wise deduplication is done inline.  ...  Since deduplication for primary storage requires low latency and detecting block-wise deduplication creates a large amount of metadata, it is necessary to efficiently use the memory of the system.  ...  We also thank our managers, Katsuhiko Nishikawa, Yasuo Noguchi, Riichiro Take and Koichi Kumon for their suggestions and support.  ... 
doi:10.1109/msst.2011.5937237 dblp:conf/mss/TsuchiyaW11 fatcat:avn7dlenaffrpb7xswhed5mlta

Reducing replication bandwidth for distributed document databases

Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, Jin Li, Gregory R. Ganger
2015 Proceedings of the Sixth ACM Symposium on Cloud Computing - SoCC '15  
This paper presents a deduplication system called sDedup that reduces the amount of data transferred over the network for replicated document DBMSs. sDedup uses similaritybased deduplication to remove  ...  It exploits key characteristics of document-oriented workloads, including small item sizes, temporal locality, and the incremental nature of document edits.  ...  Acknowledgements: We thank Sanjeev Mehrotra for helping with the data chunking portion of the system and for useful discussions.  ... 
doi:10.1145/2806777.2806840 dblp:conf/cloud/XuPS0G15 fatcat:ut3s5g6kobefvkjohwdxskqjv4

Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function

Ahmed Sardar M. Saeed, Loay E. George
2021 Symmetry  
compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%.  ...  hashing and lookup phases compared to the other deduplication systems.  ...  system add complexity which affect deduplication throughput Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality [14] Sampling and sparse indexing for the hashing index table  ... 
doi:10.3390/sym13111978 fatcat:m4ohqfy7ijgctdagyrapjem67q


Gaurab Basu, Shripad Nadgowda, Akshat Verma
2014 Proceedings of the 15th International Middleware Conference on - Middleware '14  
LVD is motivated by clouds, where VMs are created from golden masters and use standardized middleware and management tools leading to high content similarity.  ...  We observed that LVD reduced disk space and disk I/O by 70%, making applications run faster by 25% on an average.  ...  Duplicate elimination techniques at storage layer do not use host resources and can maintain large hash indexes, and perform out-of-band reading and merging of data blocks.  ... 
doi:10.1145/2663165.2663322 dblp:conf/middleware/BasuNV14 fatcat:73tsbrhlwvfjffgps2sihit6g4

dedupv1: Improving deduplication throughput using solid state drives (SSD)

Dirk Meister, Andre Brinkmann
2010 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)  
It is an inline deduplication system as it performs chunking and fingerprinting online and only stores new data, but it is able to delay much of the processing as well as IO operations.  ...  This is achieved by using a hybrid deduplication design.  ...  The throughput would degenerate in low-locality settings. Lillibridge et al. presented a deduplication approach using sampling and sparse indexing.  ... 
doi:10.1109/msst.2010.5496992 dblp:conf/mss/MeisterB10 fatcat:waeojspvgnbspc3tvci6uyk3oq
« Previous Showing results 1 — 15 out of 50 results