Cooperative load balancing in distributed systems

D. Grosu, A. T. Chronopoulos, M. Y. Leung
2008 Concurrency and Computation  
A serious difficulty in concurrent programming of a distributed system is how to deal with scheduling and load balancing of such a system which may consist of heterogeneous computers. In this paper, we formulate the static load-balancing problem in single class job distributed systems as a cooperative game among computers. The computers comprising the distributed system are modeled as M/M/1 queueing systems. It is shown that the Nash bargaining solution (NBS) provides an optimal solution
more » ... ion point) for the distributed system and it is also a fair solution. We propose a cooperative load-balancing game and present the structure of NBS. For this game an algorithm for computing NBS is derived. We show that the fairness index is always equal to 1 using NBS, which means that the solution is fair to all jobs. Finally, the performance of our cooperative load-balancing scheme is compared with that of other existing schemes. based on their resource usage characteristics and ownership. For example, the jobs that belong to a single user can form a class. Alternatively, we can distinguish different classes of jobs by their execution times. Depending on how many job classes are considered we can have single class or multi-class job distributed systems. In this paper, we consider the load-balancing problem in single class job distributed systems. The single class job system refers to jobs that have the same computational requirements and take unit time to execute. There are three typical approaches to load-balancing problem in single class job distributed systems: global, non-cooperative and cooperative. In the global approach there exists only one decision maker that optimizes the response time of the entire system over all jobs. The goal is to obtain a system-wide optimal allocation of jobs to computers. This approach has been studied extensively using techniques such as nonlinear optimization [1,2] and polymatroid optimization [3] . The load-balancing schemes that implement the global approach are centralized and determine a load allocation resulting in a system-wide optimal response time. In the non-cooperative approach each of (infinitely or finitely) many jobs optimizes its own response time independently of the others and they all eventually reach an equilibrium. This situation can be modeled as a non-cooperative game among jobs. At the equilibrium solution a job cannot receive any further benefit by changing its own decision. For an infinite number of jobs, this problem has been studied in [4] . The equilibrium is called Wardrop equilibrium. For a finite number of jobs, the equilibrium is the Nash equilibrium [5] . The non-cooperative load-balancing schemes are distributed. Their main drawback is that under certain conditions [6] the equilibrium load allocation provides a suboptimal system-wide response time. In the cooperative approach several decision makers (e.g. jobs, computers) cooperate in making the decisions so that each of them will operate at its optimum. The decision makers have complete freedom of preplay communication to make joint agreements about their operating points. This situation can be modeled as a cooperative game and game theory offers a suitable modeling framework [5] . The cooperative schemes provide a Pareto-optimal response time and also guarantee the fairness of the resource allocation to all the jobs. The fairness of allocation considered here requires that the jobs obtain the same response time independent of the allocated computer. Related work: The problem of static load balancing in single class job distributed systems has been studied in the past using the global and the non-cooperative approach. This paper investigates the cooperative approach. Preliminary results on this were reported in our previous paper [7] . The focus of the global approach is on minimizing the expected response time of the entire system over all jobs. Tantawi and Towsley [2] formulated the load-balancing problem as a nonlinear optimization problem and gave an algorithm for computing the allocation. Kim and Kameda [8] derived a more efficient algorithm to compute the allocation. Li and Kameda [9,10] proposed algorithms for static load balancing in star and tree networks. Tang and Chanson [1] proposed and studied several static load-balancing schemes that take into account the job dispatching strategy. Also, there exist several studies on static load balancing in multi-class job systems [11] [12] [13] [14] . A closely related problem to the static load-balancing problem studied in this paper is the scheduling of divisible loads. Scheduling divisible loads problem is the subject of divisible load theory (DLT) [15] . DLT considers the scheduling of arbitrarily divisible loads on a distributed computing system. DLT problems have large data sets where every element within the set requires an identical type of processing. The set can be partitioned into any number of fractions where each fraction requires
doi:10.1002/cpe.1331 fatcat:4im6auq67fh2vfgsmrwne5p7da