A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
MRPB: Memory request prioritization for massively parallel processors
2014
2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)
This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a ...
We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. ...
We thank Kevin Skadron, Daniel Lustig, and the anonymous reviewers for their feedback. ...
doi:10.1109/hpca.2014.6835938
dblp:conf/hpca/JiaSM14
fatcat:uxa46ufshver3jdjmn4axsd35e
Adaptive Cache Management for Energy-Efficient GPU Computing
2014
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
The massive amount of memory requests generated by GPUs cause cache contention and resource congestion. ...
With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many applications that have regular memory access patterns. ...
We also thank the anonymous reviewers for their insightful comments and suggestions, and Wenhao Jia from Princeton University for generously sharing his source code. This work is partly ...
doi:10.1109/micro.2014.11
dblp:conf/micro/ChenCRLWH14
fatcat:vgfsmescnvb7jfwk3vpyjwqdnm
Efficient utilization of GPGPU cache hierarchy
2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015
Compared to prior work, it achieves 1.7X and 1.5X performance improvement over Cache-Conscious Wavefront Scheduler and Memory Request Prioritization Buffer respectively. ...
However, due to the massive multithreading, GPGPU caches suffer from severe resource contention and low data-sharing which may degrade the performance instead. ...
We also thank Wenhao Jia and Tim Rogers for generously sharing the source code of MRPB and CCWS respectively. Special thanks go to Ahmed ElTantawy for his assistance with GPGPU-sim tool. ...
doi:10.1145/2716282.2716291
dblp:conf/ppopp/KhairyZW15
fatcat:l5jwwrzbyzaqtmljn7h7yssueq
This fine-grained insertion policy is extended to prioritize coherent loads over divergent loads so that coherent loads are less vulnerable to both inter-and intra-warp thrashing. ...
However, there is a lack of salient cache mechanisms that can recognize the need of managing GPU cache blocks at the warp level for increasing the number of warps ready for execution. ...
The authors are very thankful to anonymous reviewers for their invaluable feedback. ...
doi:10.1145/2751205.2751239
dblp:conf/ics/WangYSW15
fatcat:bz3db7ty3fac7hgak6jilvfurm
Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
2019
IEEE transactions on computers
To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases ...
The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. ...
MRPB [13] is a memory-request priorization buffer that allows reordering and bypassing memory requests before they access the L1 cache. ...
doi:10.1109/tc.2019.2907591
fatcat:wa5fxox64nculcccw736evzaru
A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법
2017
KIPS Transactions on Computer and Communication Systems
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법
General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. ...
Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. ...
Jia et al. [32] proposed a hardware structure called memory request prioritization buffer (MRPB), which employs request reordering and cache bypassing, to avoid a system bottleneck in GPU caches. ...
doi:10.3745/ktccs.2017.6.5.219
fatcat:q6l2q3yt6bhnrbf4fjvwo5p56i
A REUSED DISTANCE BASED ANALYSIS AND OPTIMIZATION FOR GPU CACHE
2016
They also observe the status of memory system to make the decision of warp throttling. Jia et al. [38] illustrate Memory Request Prioritization Buffer(MRPB). ...
IPC Unlike CPU, GPUs are designed to deliver a tremendous throughput by launching massive threads in parallel. ...
doi:10.25772/9jsy-jc83
fatcat:j7cqueozjve45c2etqkzx22t4a