Filters








7 Hits in 3.2 sec

MRPB: Memory request prioritization for massively parallel processors

Wenhao Jia, Kelly A. Shaw, Margaret Martonosi
2014 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)  
This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a  ...  We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance.  ...  We thank Kevin Skadron, Daniel Lustig, and the anonymous reviewers for their feedback.  ... 
doi:10.1109/hpca.2014.6835938 dblp:conf/hpca/JiaSM14 fatcat:uxa46ufshver3jdjmn4axsd35e

Adaptive Cache Management for Energy-Efficient GPU Computing

Xuhao Chen, Li-Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, Wen-Mei Hwu
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
The massive amount of memory requests generated by GPUs cause cache contention and resource congestion.  ...  With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many applications that have regular memory access patterns.  ...  We also thank the anonymous reviewers for their insightful comments and suggestions, and Wenhao Jia from Princeton University for generously sharing his source code. This work is partly  ... 
doi:10.1109/micro.2014.11 dblp:conf/micro/ChenCRLWH14 fatcat:vgfsmescnvb7jfwk3vpyjwqdnm

Efficient utilization of GPGPU cache hierarchy

Mahmoud Khairy, Mohamed Zahran, Amr G. Wassal
2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015  
Compared to prior work, it achieves 1.7X and 1.5X performance improvement over Cache-Conscious Wavefront Scheduler and Memory Request Prioritization Buffer respectively.  ...  However, due to the massive multithreading, GPGPU caches suffer from severe resource contention and low data-sharing which may degrade the performance instead.  ...  We also thank Wenhao Jia and Tim Rogers for generously sharing the source code of MRPB and CCWS respectively. Special thanks go to Ahmed ElTantawy for his assistance with GPGPU-sim tool.  ... 
doi:10.1145/2716282.2716291 dblp:conf/ppopp/KhairyZW15 fatcat:l5jwwrzbyzaqtmljn7h7yssueq

DaCache

Bin Wang, Weikuan Yu, Xian-He Sun, Xinning Wang
2015 Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15  
This fine-grained insertion policy is extended to prioritize coherent loads over divergent loads so that coherent loads are less vulnerable to both inter-and intra-warp thrashing.  ...  However, there is a lack of salient cache mechanisms that can recognize the need of managing GPU cache blocks at the warp level for increasing the number of warps ready for execution.  ...  The authors are very thankful to anonymous reviewers for their invaluable feedback.  ... 
doi:10.1145/2751205.2751239 dblp:conf/ics/WangYSW15 fatcat:bz3db7ty3fac7hgak6jilvfurm

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Francisco Candel, Alejandro Valero, Salvador Petit, Julio Sahuquillo
2019 IEEE transactions on computers  
To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases  ...  The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches.  ...  MRPB [13] is a memory-request priorization buffer that allows reordering and bypassing memory requests before they access the L1 cache.  ... 
doi:10.1109/tc.2019.2907591 fatcat:wa5fxox64nculcccw736evzaru

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법

Do Cong Thuan, Yong Choi, Jong Myon Kim, Cheol Hong Kim
2017 KIPS Transactions on Computer and Communication Systems  
General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism.  ...  Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications.  ...  Jia et al. [32] proposed a hardware structure called memory request prioritization buffer (MRPB), which employs request reordering and cache bypassing, to avoid a system bottleneck in GPU caches.  ... 
doi:10.3745/ktccs.2017.6.5.219 fatcat:q6l2q3yt6bhnrbf4fjvwo5p56i

A REUSED DISTANCE BASED ANALYSIS AND OPTIMIZATION FOR GPU CACHE

Dongwei Wang
2016
They also observe the status of memory system to make the decision of warp throttling. Jia et al. [38] illustrate Memory Request Prioritization Buffer(MRPB).  ...  IPC Unlike CPU, GPUs are designed to deliver a tremendous throughput by launching massive threads in parallel.  ... 
doi:10.25772/9jsy-jc83 fatcat:j7cqueozjve45c2etqkzx22t4a