A Top-Down Parallel Semisort

Yan Gu, Julian Shun, Yihan Sun, Guy E. Blelloch
2015 Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures - SPAA '15  
Semisorting is the problem of reordering an input array of keys such that equal keys are contiguous but different keys are not necessarily in sorted order. Semisorting is important for collecting equal values and is widely used in practice. For example, it is the core of the MapReduce paradigm, is a key component of the database join operation, and has many other applications. We describe a (randomized) parallel algorithm for the problem that is theoretically efficient (linear work and
more » ... ic depth), but is designed to be more practically efficient than previous algorithms. We use ideas from the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, we process the hashed values of keys directly top-down. We implement the algorithm and experimentally show on a variety of input distributions that it outperforms a similarly-optimized radix sort on a modern 40-core machine with hyper-threading by about a factor of 1.7-1.9, and achieves a parallel speedup of up to 38x. We discuss the various optimizations used in our implementation and present an extensive experimental analysis of its performance.
doi:10.1145/2755573.2755597 dblp:conf/spaa/GuSSB15 fatcat:nexgxl375vckzgfhs73rwryfyu