Control-based load-balancing techniques: Analysis and performance evaluation via a randomized optimization approach
Control Engineering Practice
Cloud applications are often subject to unexpected events like flashcrowds and hardware failures. Users that expect a predictable behavior may abandon an unresponsive application when these events occur. Researchers and engineers addressed this problem on two separate fronts: first, they introduced replicas -copies of the application with the same functionality -for redundancy and scalability; second, they added a selfadaptive feature called brownout inside cloud applications to bound response
... imes by modulating user experience. The presence of multiple replicas requires a dedicated component to direct incoming traffic: a load-balancer. Existing load-balancing strategies based on response times interfere with the response time controller developed for brownout-compliant applications. In fact, the brownout approach bounds response times using a control action. Hence, the response time, that was used to aid load-balancing decision, is not a good indicator of how well a replica is performing. ✩ This work was partially supported by the Swedish Research Council (VR) for the projects "Cloud Control" and "Power and temperature control for large-scale computing infrastructures", and through the LCCC Linnaeus and ELLIIT Excellence Centers. To fix this issue, this paper reviews some proposal for brownout-aware load-balancing and provides a comprehensive experimental evaluation that compares them. To provide formal guarantees on the load-balancing performance, we use a randomized optimization approach and apply the scenario theory. We perform an extensive set of experiments on a real machine, extending the popular lighttpd web server and load-balancer, and obtaining a production-ready implementation. Experimental results show an improvement of the user experience over Shortest Queue First (SQF) -believed to be near-optimal in the non-adaptive case. The improved user experience is obtained preserving the response time predictability.