Estimating Performance of a Ray-Tracing ASIC Design

Sven Woop, Erik Brunvand, Philipp Slusallek
2006 2006 IEEE Symposium on Interactive Ray Tracing  
Figure 1 : Test scenes used to evaluate the DRPU ASIC: Conference (282k triangles) , Mafia (15k triangles), Skeleton (16k triangles), Helix (78k triangles), and DynGael (85k triangles). For more test scenes see Figure 6 . ABSTRACT Recursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light transport in a scene. Algorithmic improvements and FPGA-based hardware implementations of ray tracing have demonstrated realtime performance but
more » ... ware that achieves performance levels comparable to commodity rasterization graphics chips is still not available. This paper describes the architecture and ASIC implementations of the DRPU design (Dynamic Ray Processing Unit) that closes this performance gap. The DRPU supports fully programmable shading and most kinds of dynamic scenes and thus provides similar capabilities as current GPUs. It achieves high efficiency due to SIMD processing of floating point vectors, massive multithreading, synchronous execution of packets of threads, and careful management of caches for scene data. To support dynamic scenes B-KD trees are used as spatial index structures that are processed by a custom traversal and intersection unit and modified by an Update Processor on scene changes. The DRPU architecture is specified as a high-level structural description in a functional language and mapped to both FPGA and ASIC implementations. Our FPGA prototype clocked at 66 MHz achieves higher ray tracing performance than CPU-based ray tracers even on a modern multi-GHz CPU. We provide performance results for two 130nm ASIC versions and estimate what performance would be using a 90nm CMOS process. For a 90nm version with a 196mm 2 die we conservatively estimate clock rates of 400 MHz and ray tracing performance of 80 to 290 fps at 1024x768 resolution in our test scenes. This estimated performance is 70 times faster than what is achievable with standard multi-GHz desktop CPUs.
doi:10.1109/rt.2006.280209 fatcat:gwn2zb66hves5nh2bpi33teg3m