Parallel execution of logic programs by load sharing

Zheng Lin
1997 The Journal of Logic Programming  
z. LIN to a large number of processors being used. A traditional task scheduler relies heavily on shared resources, i.e., shared memory or an interconnection network, to perform its functions. As the scale of a nmltiprocessor system grows, and the speed of implementing resolution in local processors improves, task scheduling becomes increasingly frequent. However, the speed of the scheduler cannot be expected to increase proportionally if the scheduler continues to operate on resources shared
more » ... all processors, that is, the interconnection network, or the shared memory if available. It is therefore of growing importance to search for methods that are less reliant on resources subject to competition by processors in a parallel computer. One way to avoid the bottleneck is to completely abandon interprocess communication. This is possible for the execution of Horn-clause logic programs: alternative solution paths to a given goal can be pursued simultaneously because there is no dependency (excluding i/o) among the solution paths. This type of parallelism is normally referred to as Or-parallelism in the literature. In this paper, we discuss a scheduling scheme called self-organizing scheduling which directs processors to share the search space, the search tree defined implicitly by a program, according to a task distribution rule followed by all processors. We investigate problems that arise within this framework, namely, the load balancing problem and redundant computation problem, and study solutions to the problems. We discuss methods, including compile time program restructuring and choice predicate manipulation at run time, that help alter the shape of the search tree so as to facilitate a probabilistic task distribution rule which achieves the best possible distribution under the condition that the size of tasks is not known a priori. Using a probabilistic model, we show that the task distribution rule minimizes the average parallel run time in many circumstances. Experimental data are presented showing the effectiveness of the methods. Empirically, many programs that were frequently used as Or-parallelism benchmarks in the literature can be restructured to effectively take advantage of the proposed scheduling method. In addition, due to the low overhead nature of the proposed method, dynamic task redistribution (in a traditional way) can always be resumed to cope with a highly imbalanced search tree, without paying a significant extra price for first applying the self-organizing scheduling technique. For problems with fine-grained parallelism (e.g., an optimized 8-queens, zebra, turtles program, running on 30 or more processors) whose speed-up factors reach peaks at less than 30 processors on a typical Or-parallel Prolog system in previous simulation studies [161, we found that the peak speed-up factors can be doubled or tripled using the self-organizing scheduling method even without resorting to communication. An experimental parallel logic programnfing system has been implemented on a NUMA multiprocessor, Hector [23] . Experimental data from the system appear to be consistent with the simulation results with up to 16 processors. Several schemes had been proposed in the literature [11, 1] along the lines of adopting a zero or near-zero communication scheduling scheme. However, the proposals had done very little to address the load balancing and redundant computation problems which offset gains by eliminating (or reducing) communication overhead. The potential of noncommunicating protocols remains quite unclear in the absence of quantitative results. This paper addresses these issues, and in particular, investigates solutions to problems that arise within a noncommunicating protocol. The paper is organized as follows. Section 2 provides a background on parallel execution of logic programs; Section 3 discusses the proposed methods; Section 4
doi:10.1016/s0743-1066(96)00014-3 fatcat:u5qsrq33unbwzhr4t5lqlnl4km