An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

Andrew Davidson, Yao Zhang, John D. Owens
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of onchip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between
more » ... on stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.
doi:10.1109/ipdps.2011.92 dblp:conf/ipps/DavidsonZO11 fatcat:tpvrna62fvas5lpmbjdntqtj7u