Using the Right Amount of Monitoring in Adaptive Load Sharing

David Breitgand, Rami Cohen, Amir Nahir, Danny Raz
2007 Fourth International Conference on Autonomic Computing (ICAC'07)  
Consider a service that is being provided by a set of servers over the network. The goal of the service provider is to provide the best service (say, to minimize the service time) given the amount of available resources (e.g., the number of servers). To achieve this goal the provider may introduce a load sharing capability into the system. In order to cope with dynamic environments, where the actual availability of each server changes over time, the load sharing mechanism needs to adapt to the
more » ... ds to adapt to the current global state of the system. This implies that updated load information needs to be collected from the servers. Handling such load information requests requires small but nonzero resources (e.g., CPU) from each server. This may reduce the actual service rate of the server, and thus, it is not easy to predict the actual amount of improvement expected from preferring a specific configuration of a dynamic load sharing scheme. For this reason it is important to identify just the right amount of resources that should be allocated for the monitoring of the servers' load in order to maximize the overall system performance. Moreover, since the optimal amount of monitoring depends on external parameters (such as the arrival rate of the service requests stream), the system should self-adjust the amount of monitoring according to the current conditions. A very efficient load sharing model was studied in [3] by Mitzenmacher. This model, termed the supermarket model, uses a very simple randomization strategy: when a new task arrives, d < n servers are selected uniformly at random, and the task is assigned to the server with the shortest queue among these d chosen servers. For d = 1 this process simply assigns jobs to servers uniformly at random, regardless of their load. However, for d = 2, the job is assigned to the least loaded server (the one with the shortest queue) among the two randomly chosen servers. It is shown in [3] that this simple process results in an exponential improvement of the expected overall time in the system compared to the (load independent) random assignment scheme. Further increasing d improves the expected service time linearly. The results of [3] suggest that even a very small amount of man- * agement information coupled with random job assignment may lead to a very efficient adaptive load balancing strategy, and as we use more information we keep on improving the scheme. However, this study assumes that the information about the local load of the servers is obtained and processed at no cost. As explained above, in many scenarios this assumption is not realistic. In this work we extend the aforementioned supermarket model by incorporating the management costs into it. In particular, we assume that when a server is polled about its load, it has to allocate resources in order to answer this query. We consider a system that consists of n identical servers. Each server processes its incoming service requests according to the FIFO discipline. Service requests arrive to the system in a Poisson stream of rate λ · n, 0 < λ < 1, service time is exponentially distributed with mean 1. In the centralized ESM model, all clients' requests arrive at a centralized load balancing device. This device then selects d < n servers uniformly at random (with replacement) and sends d inquiries about the length of the server queue to each of the selected servers. These monitoring requests have a precedence over the actual service requests, i.e., upon receiving a monitoring request, the server preempts the currently running job (if such exists) and answers the load request immediately. We assume that processing the monitoring request takes a fraction 0 < C < 1 of the mean service time of the actual service. This factor is the load monitoring efficiency ratio that reflects the fraction of the resources (i.e., CPU) needed in order to answer a load request. When the load balancing device (dispatcher) obtains all d answers (we assume that there are no message losses), it selects the server with the minimal queue length (ties are broken arbitrarily) and forwards the job to this server. The theoretical analysis of this model (see [1] ) results in the following equation that determines the total time in the system as a function of the load λ, the number of monitored server d, and the load monitoring efficiency ratio C. The main outcome of this analysis is that for each system load and monitoring efficiency ratio C, there exists an optimal number d * of servers that should be monitored in order Fourth International Conference on Autonomic Computing (ICAC'07) 0-7695-2779-5/07 $20.00
doi:10.1109/icac.2007.41 dblp:conf/icac/BreitgandCNR07 fatcat:2u6zujs6obbktejsy5737ktsgi