Performance of windows multicore systems on threading and MPI

Judy Qiu, Seung-Hee Bae
2011 Concurrency and Computation  
We present performance results on a Windows cluster with up to 768 cores using MPI and two variants of threading -CCR and TPL. CCR (Concurrency and Coordination Runtime) presents a message based interface while TPL (Task Parallel Library) allows for loops to be automatically parallelized. MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node. We look at performance of two significant bioinformatics applications; gene clustering
more » ... and dimension reduction. We find that the two threading runtimes offer similar performance with MPI outperforming both at low levels of parallelism but threading much better when the grain size (problem size per process/thread) is small. We develop simple models for the performance of the clustering code.
doi:10.1002/cpe.1762 fatcat:czc6qeoctncu7mm4oafqra3ppq