Fault tolerance of allocation schemes in massively parallel computers

M. Livingston, Q.F. Stout
Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation  
This paper examines the problem of locating and allocating large fault-free subsystems in multiuser massively parallel computer systems. Since the allocation schemes used in such large systems cannot allocate all possible subsystems a reduction in fault tolerance is experienced. We analyze the effect of different allocation methods including the buddy and Graycoded buddy schemes for the allocation of subsystems in the hypercube and in the 2-dimensional mesh and torus. Both worst case and
more » ... rst case and expected case performance is studied. Generalizing the buddy and Gray-coded systems, we introduce a new family of allocation schemes which exhibits a significant improvement in fault tolerance over the existing schemes and which uses relatively few additional resources. For purposes of comparison, we study the behavior of the various schemes on the allocation of subsystems of ¾ ½ processors in the hypercube, mesh, and torus consisting of ¾ ¾¼ processors. Our methods involve a combination of analytic techniques and simulation.
doi:10.1109/fmpc.1988.47483 fatcat:zdegbq2ufjauxpccfmgem6ra4i