Productive Parallel Linear Algebra Programming with Unstructured Topology Adaption
2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Sparse linear algebra is a key component of many scientific computations such as computational fluid dynamics, mechanical engineering or the design of new materials to mention only a few. The discretization of complex geometries in unstructured meshes leads to sparse matrices with irregular patterns. Their distribution in turn results in irregular communication patterns within parallel operations. In this paper, we show how sparse linear algebra can be implemented effortless on distributed
... y architectures. We demonstrate how simple it is to incorporate advanced partitioning, network topology mapping, and data migration techniques into parallel HPC programs by establishing novel abstractions. For this purpose, we developed a linear algebra library -Parallel Matrix Template Library 4 -based on generic and meta-programming introducing a new paradigm: meta-tuning. The library establishes its own domain-specific language embedded in C ++ . The simplicity of software development is not paid by lower performance. Moreover, the incorporation of topology mapping demonstrated performance improvements up to 29 %. node communication and are connected by multi-dimensional high-performance network topologies. Manual optimization for such architectures is difficult because parameters and topologies change for each system and even from run-torun based on the current allocation of nodes. Even if the network topology is understood, mapping the unstructured, input-dependent application topology is a daunting task. Thus, an automated and generic technique for mapping application communication topologies to network topologies during runtime is needed. This fact is acknowledged in several parallel programming frameworks: CHARM++  provides transparent support for topology mapping by process migration and the Message Passing Interface (MPI) [8, §7] allows users to specify the communication relations among processes of a parallel program, enabling the MPI implementation to renumber processes for efficient mapping. Contributions: In this work, we propose an abstract library interface for parallel sparse matrix computations and effective mapping schemes. We show how the MPI-2.2 graph interface and the topology mapping library can be integrated into parallel applications without increasing code complexity. We will show in §II-B-II-D that using our mapping in the Parallel Matrix Template Library 4 (PMTL4) is as easy as writing Matlab code. The partitioning and the topology mapping is entirely orthogonal to the distributed data layout and hidden from the user. The two transformations can be specified by a single object applicable universally on matrices and vectors. All information for partitioning and mapping can be extracted from a sparse matrix allowing to write the reorganization in a single statement.