Performance of Windows Multicore Systems on Threading and MPI

Judy Qiu, Scott Beason, Seung-Hee Bae, Saliya Ekanayake, Geoffrey Fox
2010 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing  
We present performance results on a Windows cluster with up to 768 cores using MPI and two variants of threading -CCR and TPL. CCR (Concurrency and Coordination Runtime) presents a message based interface while TPL (Task Parallel Library) allows for loops to be automatically parallelized. MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node. We look at performance of two significant bioinformatics applications; gene clustering
more » ... and dimension reduction. We find that the two threading runtimes offer similar performance with MPI outperforming both at low levels of parallelism but threading much better when the grain size (problem size per process/thread) is small. We develop simple models for the performance of the clustering code.
doi:10.1109/ccgrid.2010.105 dblp:conf/ccgrid/QiuBBEF10 fatcat:uw5utdknnjdydfeffaxmx5qfaq