A novel approach to data deduplication over the engineering-oriented cloud systems

Zhe Sun, Jun Shen, Jianming Yong
2013 Integrated Computer-Aided Engineering  
This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as backend. Hadoop distributed file system (HDFS) is a common distribution file system on the cloud, which is used with Hadoop database (HBase). We use HDFS to build up a mass storage system and
more » ... oy HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications Abstract. This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop distributed file system (HDFS) is a common distribution file system on the cloud, which is used with Hadoop database (HBase). We use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications.
doi:10.3233/ica-120418 fatcat:oirmipybzvgsvnpknhkbbqgcya