A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is
In the era of multicores, many applications that tend to require substantial compute power and data crunching (aka Throughput Computing Applications) can now be run on desktop PCs. However, to achieve the best possible performance, applications need to be written in a way that exploits both parallelism and cache locality. In this paper, we propose one such approach for x86-based architectures. Our approach uses cache-oblivious techniques to divide a large problem into smaller subproblems whichdoi:10.1109/ms.2011.2 fatcat:3ysms4aeebarpfhdgbzprloyxi