ENHANCE THE PERFORMANCE OF HADOOP DISTRIBUTED FILE SYSTEM FOR RANDOM FILE ACCESS USING INCREASED BLOCK SIZE

P Gudadhe, A Gawande, L Gautham
unpublished
Apache Hadoop is a top-level Apache project that includes open source implementations of a distributed file system and MapReduce that were inspired by Google's GFS and MapReduce projects. It also called as a HDFS. The Hadoop Distributed File System (HDFS) is designed to store very large data sets, and to stream those data sets at high bandwidth to any application. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. Hadoop Distributed
more » ... Hadoop Distributed File System (HDFS) has been widely used to manage the large-scale data due to its high scalability. Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing [1]. The random queries in large-scale data are becoming more and more important. Unfortunately, the HDFS is not optimized for random reads. Hence there are many disadvantages in random access to HDFS. So to overcome that problem we try to provide solutions which improve the performance of HDFS while accessing random file. Improvement of file access performance is a great challenge in real-time system. Then to achieve this design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache).[2] Another method is to use the block of larger size for storing the files in HDFS. This will increase the performance of HDFS for random access.
fatcat:vpirk7hlb5gkjlz7otqajic5zi