SelectiveEC: Selective Reconstruction in Erasure-coded Storage Systems

Liangliang Xu, Min Lyu, Qiliang Li, Lingjiang Xie, Yinlong Xu
2020 USENIX Workshop on Hot Topics in Storage and File Systems  
Erasure coding has been a commonly used approach to provide high reliability with low storage cost. But the skewed load in a recovery batch severely slows down the failure recovery process in storage systems. To this end, we propose a balanced scheduling module, SelectiveEC, which schedules reconstruction tasks out of order by dynamically selecting some stripes to be reconstructed into a batch and selecting source nodes and replacement nodes for each reconstruction task. So it achieves balanced
more » ... network recovery traffic, computing resources and disk I/Os against single node failure in erasure-coded storage systems. Compared with conventional random reconstruction, SelectiveEC increases the parallelism of recovery process up to 106% and averagely bigger than 97% in our simulation. Therefore, SelectiveEC not only speeds up recovery process, but also reduces the interference of failure recovery on the front-end applications.
dblp:conf/hotstorage/XuLLXX20 fatcat:gfhto3zg7nextak3pd634yb7va