Data Placement Algorithm for Improving I/O Load Balance without Using Popularity Information

Xiangyu Luo, Gang Xin, Xiaolin Gui
2019 Mathematical Problems in Engineering  
Data placement considerably affects the I/O performance of distributed storage systems such as HDFS. An ideal placement algorithm should keep the I/O load evenly distributed among different storage nodes. Most of the existing placement algorithms with I/O load balance guarantee depend on the information of data popularity to make the placement decisions. However, the popularity information is typically not available in the data placement phase. Furthermore, it usually varies during the data
more » ... cycle. In this paper, we propose a new placement algorithm called Balanced Distribution for Each Age Group (BEAG), which makes data placement decisions in the absence of the popularity information. This algorithm maintains multiple counters for each storage node, with each counter representing the amount of data belonging to a certain age group. It ensures that the data in each age group are equally scattered among the different storage nodes. As the popularity variance of the data belonging to the same age group is considerably smaller than that of the entire data, BEAG significantly improves the I/O load balance. Experimental results show that compared to other popularity independent algorithms, BEAG decreases the I/O load standard deviation by 11.6% to 30.4%.
doi:10.1155/2019/2617630 fatcat:7dheycobdvbjpk4tmcctr6kgqe