Hierarchical Matrix-Matrix Multiplication Based on Multiprocessor Tasks [chapter]

Sascha Hunold, Thomas Rauber, Gudula Rünger
2004 Lecture Notes in Computer Science  
We consider the realization of matrix-matrix multiplication and propose a hierarchical algorithm implemented in a task-parallel way using multiprocessor tasks on distributed memory. The algorithm has been designed to minimize the communication overhead while showing large locality of memory references. The task-parallel realization makes the algorithm especially suited for cluster of SMPs since tasks can then be mapped to the different cluster nodes in order to efficiently exploit the cluster
more » ... chitecture. Experiments on current cluster machines show that the resulting execution times are competitive with state-of-the-art methods like PDGEMM .
doi:10.1007/978-3-540-24687-9_1 fatcat:ogm27c5efjddrfzk7c3ec2bpzi