Exploring parallelism in volume ray casting
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12
Direct volume rendering of irregular 3D datasets demands high computational power and memory bandwidth. Recent research in optimizing volume rendering algorithms are exploring the high processing power offered by a new trend in hardware design: multithreaded accelerator devices. Accelerators like the Graphics Processing Units (GPU) and the Cell Broadband Engine processor (Cell BE) are used as integrated coprocessors, and the off-loading of the application from the CPU to the accelerator offers
... romising speedups. The difficulty in using these devices, however, is how to program them efficiently, since their architectural features may be completely distinct. In this paper, we present some new architectural-aware algorithms for irregular grid rendering based on the ray casting method, designed for the Cell BE and the GPU. We investigate the ray traversal inside each accelerator in terms of data access, load balancing, and code divergence, and find new opportunities for performance optimizations based on the ray casting specific needs. Our results show that squeezing these architectures for performance reveals their limitations and can significantly improve the ray casting efficiency.