Filters








416 Hits in 5.7 sec

Future of GPGPU Micro-Architectural Parameters

Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013  
A key-enabler is dynamism and workload-adaptiveness, enabling among others: dynamic register file sizing, latency aware scheduling, roofline-aware DVFS, runtime cluster fusion, and dynamic warp sizing.  ...  For each parameter, we propose changes to improve GPU design, keeping in mind trends such as dark silicon and the increasing popularity of GPGPU architectures.  ...  Outlook As an outlook towards the future of GPGPU architectures, we propose two techniques to improve performance and energy efficiency with respect to the number of active threads: dynamic register file  ... 
doi:10.7873/date.2013.089 dblp:conf/date/NugterenBC13 fatcat:dss6qu4t25cmzo722inw4qs6iq

A Write-Aware STTRAM-Based Register File Architecture for GPGPU

Jue Wang, Yuan Xie
2015 ACM Journal on Emerging Technologies in Computing Systems  
A write-aware STTRAM-based register file architecture for GPGPU. ACM J.  ...  The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation.  ...  A power efficient register file is evaluated by aggressively moving a register into drowsy state [Abdel-Majeed and Annavaram 2013] .  ... 
doi:10.1145/2700230 fatcat:p3mhxhxeivd4lac6w7lozgoxa4

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF).  ...  The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.  ...  GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors.  ... 
doi:10.1145/2485922.2485952 dblp:conf/isca/JingSLGMGCL13 fatcat:niswlskhwbgxvc27zdnc5bnq54

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 SIGARCH Computer Architecture News  
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF).  ...  The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.  ...  GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors.  ... 
doi:10.1145/2508148.2485952 fatcat:nbaghtk2rvhfri6wkpah4t27ta

SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Wing-kei S. Yu, Ruirui Huang, Sarah Q. Xu, Sung-En Wang, Edwin Kan, G. Edward Suh
2011 Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11  
Large register files are common in highly multi-threaded architectures such as GPUs.  ...  Circuit and architecture simulations of GPU benchmarks suites show significant savings in register file area (38%) and energy (68%) over the traditional SRAM implementation, with minimal (1.4%) performance  ...  For GPGPU performance estimates and counting events for energy estimates, we modified GPGPU-Sim v2.1.1.b [1] to model the new scheduler and the hybrid register file.  ... 
doi:10.1145/2000064.2000094 dblp:conf/isca/YuHXWKS11 fatcat:qdj774nwnjb53nu25gamdfptta

SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Wing-kei S. Yu, Ruirui Huang, Sarah Q. Xu, Sung-En Wang, Edwin Kan, G. Edward Suh
2011 SIGARCH Computer Architecture News  
Large register files are common in highly multi-threaded architectures such as GPUs.  ...  Circuit and architecture simulations of GPU benchmarks suites show significant savings in register file area (38%) and energy (68%) over the traditional SRAM implementation, with minimal (1.4%) performance  ...  For GPGPU performance estimates and counting events for energy estimates, we modified GPGPU-Sim v2.1.1.b [1] to model the new scheduler and the hybrid register file.  ... 
doi:10.1145/2024723.2000094 fatcat:i2trkix455hyfa5l7bibjgfxz4

A STT-RAM-based low-power hybrid register file for GPGPUs

Gushu Li, Xiaoming Chen, Guangyu Sun, Henry Hoffmann, Yongpan Liu, Yu Wang, Huazhong Yang
2015 Proceedings of the 52nd Annual Design Automation Conference on - DAC '15  
Thus, hybrid memory system, which combines SRAM and the emerging non-volatile memory (NVM), has been employed for register file design on GPUs.  ...  Due to high leakage power of SRAM, the register file consumes 20% to 40% of the total GPU power consumption.  ...  Register File Cache. Gebhart has proposed register file cache on GPU to achieve more energy-efficiency [16] .  ... 
doi:10.1145/2744769.2744785 dblp:conf/dac/LiCSHLWY15 fatcat:t4yvgp6fgvdwdcbjqbez2neg4y

Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing

Abbas Rahimi, Amirali Ghofrani, Miguel Angel Lastras-Montano, Kwang-Ting Cheng, Luca Benini, Rajesh K. Gupta
2014 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)  
Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues.  ...  Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes.  ...  After fetch and decode stages, the source operands for each instruction are read that can come from the register file or local memory.  ... 
doi:10.1109/dac.2014.6881522 fatcat:hw5e3uj4zfa77dhytiqboj5ceq

Energy-Efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-Based Computing

Abbas Rahimi, Amirali Ghofrani, Miguel Angel Lastras-Montano, Kwang-Ting Cheng, Luca Benini, Rajesh K. Gupta
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues.  ...  Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes.  ...  After fetch and decode stages, the source operands for each instruction are read that can come from the register file or local memory.  ... 
doi:10.1145/2593069.2593132 dblp:conf/dac/RahimiGLCBG14 fatcat:ssnbhugtebe55bq4cb24wc2kei

GPUWattch

Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, Vijay Janapa Reddi
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency.  ...  General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance.  ...  from the register file.  ... 
doi:10.1145/2485922.2485964 dblp:conf/isca/LengHEGKAR13 fatcat:bkfi476bf5ed5lalls522mmd64

Vortex: OpenCL Compatible RISC-V GPGPU [article]

Fares Elsabbagh, Blaise Tine, Priyadarshini Roshan, Ethan Lyons, Euna Kim, Da Eun Shim, Lingjun Zhu, Sung Kyu Lim, Hyesoon kim
2020 arXiv   pre-print
We evaluate this design using 15nm technology. We also show the performance and energy numbers of running them with a subset of benchmarks from the Rodinia Benchmark suite.  ...  Vortex implements a SIMT architecture with a minimal ISA extension to RISC-V that enables the execution of OpenCL programs. We also extended OpenCL runtime framework to use the new ISA.  ...  INTRODUCTION The emergence of data parallel architectures and general purpose graphics processing units (GPGPUs) have enabled new opportunities to address the power limitations and scalability of multi-core  ... 
arXiv:2002.12151v1 fatcat:uvuhcu7hbfbkneh3iph5v7cpvm

Cost-effective soft-error protection for SRAM-based structures in GPGPUs

Jingweijia Tan, Zhi Li, Xin Fu
2013 Proceedings of the ACM International Conference on Computing Frontiers - CF '13  
We leverage the GPGPU microarchitecture characteristics, and propose energy-efficient protection mechanisms for two typical SRAM-based structures (i.e. instruction buffer and registers) which suffer high  ...  registers.  ...  In this paper, we explore reliable GPGPU microarchitecture designs to efficiently combat soft errors in light of small-scale processing technology.  ... 
doi:10.1145/2482767.2482804 dblp:conf/cf/TanLF13 fatcat:bft5p6jnszbejhpv6micpaqo6u

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory

Bin Wang, Yizheng Jiao, Weikuan Yu, Xipeng Shen, Dong Li, Jeffrey S. Vetter
2013 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems  
It can enable further research on the design of GPU global memory for performance and energy tradeoffs. 1526-7539/13 $26.00  ...  SIMT cores execute distinct thread, operate on scalar registers and progress in lockstep. SIMT cores in an SM share the per-SM register file as well as the configurable shared memory and L1 cache.  ...  Another purpose of this dualmode is to maintain the flexibility of simulating various emerging memory technologies.  ... 
doi:10.1109/mascots.2013.39 dblp:conf/mascots/WangJYSLV13 fatcat:muvotcf4k5gzvhu74apdomtvh4

DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files

Alejandro Valero, Dario Suarez-Gracia, Ruben Gran-Tejero
2020 IEEE Access  
Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files.  ...  The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations.  ...  DATA COMPRESSION IN GPU REGISTER FILES Zhang et al. implement a register file with spin-transfer torque magnetic RAM technology [50] .  ... 
doi:10.1109/access.2020.3025899 fatcat:bvwhzkmssjhodd7ji3zour3cc4

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation [article]

Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu
2020 arXiv   pre-print
As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 34%.  ...  Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations.  ...  An example evaluation result shows that LTRF combined with register renumbering technique enables us to implement the main register file with emerging high-density high-latency memory technologies, enabling  ... 
arXiv:2010.09330v1 fatcat:cczrbwnshzggdhu5elvgycbr4q
« Previous Showing results 1 — 15 out of 416 results