Characterization And Optimization Of Sparse Computations On Intel Xeon Phi
In this paper, we propose a lightweight optimization methodology for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel for the Intel Xeon Phi manycore processors. The large number of cores in this platform overly exposes inherent structural weaknesses of different sparse matrices, intensifying performance issues beyond the traditionally reported memory bandwidth limitation. We, thus, advocate an input-adaptive optimization approach and present a method that identifies the major
... rformance bottleneck of the kernel for every instance of the problem and selects a suitable optimization to tackle it. We describe two models for identifying the bottleneck: our first model requires performance bounds to be determined for the input matrix during an online profiling phase, while our second model only uses comprehensive structural features of the sparse matrix. Our optimizations are based on the widely used Compressed Sparse Row (CSR) storage format and have low preprocessing overheads, making our overall approach practical even in the context of iterative solvers that converge in a small number of iterations. We evaluate our methodology on the Intel Xeon Phi co-processor, codename Knights Corner (KNC), and demonstrate that it is able to distinguish and appropriately optimize the great majority of matrices in a large and diverse test suite, leading to a significant speedup of 2.2× on average over the Intel MKL library.