Approximation Algorithms for Multiprocessor Scheduling under Uncertainty
Theory of Computing Systems
Motivated by applications in grid computing and projects management, we study multiprocessor scheduling in scenarios where there is uncertainty in the successful execution of jobs when assigned to processors. We consider the problem of multiprocessor scheduling under uncertainty, in which we are given n unit-time jobs and m machines, a directed acyclic graph C giving the dependencies among the jobs, and for every job j and machine i, the probability p ij of the successful completion of job j
... n scheduled on machine i in any given particular step. The goal of the problem is to find a schedule that minimizes the expected makespan, that is, the expected completion time of all the jobs. The problem of multiprocessor scheduling under uncertainty was introduced in  and was shown to be NP-hard even when all the jobs are independent. In this paper, we present polynomial-time approximation algorithms for the problem, for special cases of the dag C. We obtain an O(log n)-approximation for the case of independent jobs, an O(log m log n log(n + m)/ log log(n + m))-approximation when C is a collection of disjoint chains, an O(log m log 2 n)-approximation when C is a collection of directed out-or in-trees, and an O(log m log 2 n log(n+m)/ log log(n+m))-approximation when C is a directed forest. • Using the algorithm for disjoint chains and the chain decomposition techniques of , we obtain O(log m log 2 n) and O(log m log 2 n log(n+m) log log(n+m) ) approximations for a collection of in-or out-trees and directed forests, respectively ( §4.2). The schedules computed by the algorithms for disjoint chains, trees, and directed forests, are all oblivious in the sense that they specify in advance the assignment of machines to jobs in each time step, independent of the set of unfinished jobs at that step. Oblivious schedules are formally defined in §2, where we also present useful definitions and important properties of schedules that are used in our main results. To the best of our knowledge, our results are the first approximation algorithms for multiprocessor scheduling under uncertainty problems. Due to space constraints, we have omitted many of the proofs; they may be found in the appendices A through C. Related work The problem studied in our work was first defined in the recent work by Malewicz , largely motivated by the application of scheduling complex dags in grid computing  . Malewicz characterizes the complexity of the problem in terms of the number of the machines and the width of the dependency graph, which is defined as the maximum number of independent jobs. He shows that when the number of machines and the width are both constants, the optimal regimen can be computed in polynomial time using dynamic programming. However, if either parameter is unbounded, the problem is NP-hard. Also, the problem can not be approximated within a factor of 5/4 unless P=NP. Our work extends that of Malewicz by studying the approximability of the problem when neither the width of the dag nor the number of machines is bounded. The uncertainty of the scheduling problem we study comes from the possible failure by a machine assigned to a job, as modeled by the p ij 's. There have been different models of uncertainty in the scheduling literature. Most notable is the model where each task has a duration of random length and may require different amount of resources. For related work, see [7, 6, 14, 29, 16, 11] . Scheduling in general has a rich history and a vast literature. There are many variants of scheduling problems, depending on various factors. For example: Are the machines related? Is the execution preemptive? Are there precedence constraints on the execution of the jobs? Are there release dates associated with the jobs? What is the objective function: makespan, weighted completion time, weighted flow time, etc.? See  for a survey and [12, 20, 28, 19, 4, 17] for representative work. Two particular variants of scheduling closely related to our work is job shop scheduling  and the scheduling of unrelated machines under precendence constraints. In the job shop scheduling problem, we are given m machines and n jobs, each job consisting of a sequence of operations. Each operation must be processed on a specified machine. A job is executed by processing its operations according to the associated sequence. At most one job can be scheduled on any machine at any time. The goal of the job shop scheduling problem is to find a schedule of the jobs on the machines that minimizes the maximum completion time. This problem is strongly NP-hard and widely studied [10, 18, 1] . Also extensively studied is the the problem of preemptively scheduling jobs with precedence constraints on unrelated parallel machines [19, 27, 17] , the processing time of a job depends on the machine to which it is assigned. One common characteristic of this problem and SUU is that in each problem, the capability of a machine i to complete a job j may vary with both i and j. However, while the unrelated parallel machines problem models this nonuniformity using deterministic processing times that vary with i and j, in SUU the jobs are all unit-size but may fail to complete with probabilities that vary with i and j. Owing to the uncertainty in the completion of jobs, SUU schedules appear to be more difficult to specify and analyze. One other technical difference is that in SUU we allow multiple machines to be assigned to the same job at the same time, for the purpose of raising the probability of successfully completing the job. The unrelated parallel machines problem is typically solved by a reduction to instances of the job shop scheduling problem. Some of our SUU algorithms also include similar reductions.