An Experimental Study on How to Build Efficient Multi-core Clusters for High Performance Computing

Luiz Carlos Pinto, Luiz H. B. Tomazella, M. A. R. Dantas
2008 2008 11th IEEE International Conference on Computational Science and Engineering  
Multi-core technology produces a new scenario for communicating processes in an MPI cluster environment and consequently the involved trade-offs need to be uncovered. This motivation guided our research and lead to a new approach for setting up more efficient clusters built with commodities. Thus, alternatively to the utilization of non-commodity interconnects such as Myrinet and Infiniband, we present a proposal based on leaving cores idle relatively to application processing in order to build
more » ... economically more accessible clusters of commodities with higher performance. Execution of fine-grained IS algorithm from NAS Parallel Benchmark revealed a speedup of up to 25%. Interestingly, a cluster organized according to the proposed setup was able to outperform a single multi-core SMP host in which all processes communicate inside the host. Therefore, empirical results indicate that our proposal has been successful for medium and fine-grained algorithms. From Figure 3 , we can state that (1) bandwidth for either one-way or two-way communication on systems B and D are greater than for systems A and C for any message length. Moreover, (2) bandwidth behavior of two-way communication for system D and of one-way communication for system B are quite similar. (3) Two-way communication bandwidth for system B is similar compared to its one-way communication pattern for small and medium-sized messages, but (4) for messages larger than 32KB its pattern becomes flat and of lower performance. (5) Communication bandwidth pattern of either one-way or two-way for systems A and C are very similar. However, (6) bandwidth is greater for two-way than for one-way communication for messages of up to 8KB for system C and up to 64KB for system A. That is in part because (7) b_eff calculates bandwidth for one-way communication based on maximum latency while bandwidth for two-way communication is based on average latency. Anyway, (8) when message length is larger, bandwidth for two-way represents up to 80% of bandwidth for one-way communication. Based on previous assertions (4) and (6) , the idle cores of systems A and C seemingly do not indicate great positive effects on performance of either one-way or two-way inter-process communication.
doi:10.1109/cse.2008.63 dblp:conf/cse/PintoTD08 fatcat:wz7gf5intvhspljwbsiprcul2a