Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, H. Kuhn
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
SMP clusters and multiclusters are widely used to execute message-passing parallel applications. The ways to map parallel processes to processors (or cores) could affect the application performance significantly due to the non-uniform communicating cost in such systems. It is desired to have a tool to map parallel processes to processors (or cores) automatically. Although there have been various efforts to address this issue, the existing solutions either require intensive user intervention, or
more » ... er intervention, or can not be able to handle the situation of multiclusters well. In this paper, we propose a profile-guided approach to find the optimized mapping automatically to minimize the cost of point-to-point communications for arbitrary message passing applications. The implemented toolset is called MPIPP (MPI Process Placement toolset), and it includes several components: 1) A tool to get the communication profile of MPI applications 2) A tool to get the network topology of target clusters 3) An algorithm to find optimized mapping, which is especially more effective than existing graph partition algorithms for multiclusters. We evaluated the performance of our tool with the NPB benchmarks and three other applications in several clusters. Experimental results show that the optimized process placement generated by our tools can achieve significant speedup.
doi:10.1145/1183401.1183451 dblp:conf/ics/ChenCHRK06 fatcat:y2etu5dounefpiygz6l4ucabtq