A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Hardware transactional memory for GPU architectures
2011
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency operations. While threads within a CUDA block/OpenCL workgroup can communicate efficiently through an intra-core scratchpad memory, threads in different blocks can only communicate via global memory accesses. Programmers wishing to
doi:10.1145/2155620.2155655
dblp:conf/micro/FungSBA11
fatcat:dbp27jwhabddfgeih6rqc7fube