Distribution sort with randomized cycling

Jeffrey Scott Vitter, David Alexander Hutchinson
2006 Journal of the ACM  
Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often di cult. In this paper we develop randomized variants of distribution sort for use with parallel independent disks. We propose a simple variant called randomized cycling distribution sort (RCD) and prove that it has optimal expected I/O complexity. The analysis uses a novel reduction to a model with signi cantly fewer probabilistic interdependencies. Experimental evidence
more » ... is provided to support its practicality. O t h e r s i m p l e v ariants are also examined experimentally and appear to o er similar advantages to RCD. Based upon ideas in RCD we propose general techniques that transparently simulate algorithms developed for the unrealistic multihead disk model so that they can be run on the realistic parallel disk model. The simulation is optimal for two important classes of algorithms: the class of multipass algorithms, which make a complete pass through their data before accessing any element a second time, and the algorithms based upon the well-known distribution paradigm of EM computation.
doi:10.1145/1162349.1162352 fatcat:ksiyk7uuwbeehjannt7s3mapim