A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Future of GPGPU Micro-Architectural Parameters
2013
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013
A key-enabler is dynamism and workload-adaptiveness, enabling among others: dynamic register file sizing, latency aware scheduling, roofline-aware DVFS, runtime cluster fusion, and dynamic warp sizing. ...
For each parameter, we propose changes to improve GPU design, keeping in mind trends such as dark silicon and the increasing popularity of GPGPU architectures. ...
Outlook As an outlook towards the future of GPGPU architectures, we propose two techniques to improve performance and energy efficiency with respect to the number of active threads: dynamic register file ...
doi:10.7873/date.2013.089
dblp:conf/date/NugterenBC13
fatcat:dss6qu4t25cmzo722inw4qs6iq
A Write-Aware STTRAM-Based Register File Architecture for GPGPU
2015
ACM Journal on Emerging Technologies in Computing Systems
A write-aware STTRAM-based register file architecture for GPGPU. ACM J. ...
The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation. ...
A power efficient register file is evaluated by aggressively moving a register into drowsy state [Abdel-Majeed and Annavaram 2013] . ...
doi:10.1145/2700230
fatcat:p3mhxhxeivd4lac6w7lozgoxa4
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). ...
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs. ...
GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors. ...
doi:10.1145/2485922.2485952
dblp:conf/isca/JingSLGMGCL13
fatcat:niswlskhwbgxvc27zdnc5bnq54
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
2013
SIGARCH Computer Architecture News
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). ...
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs. ...
GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors. ...
doi:10.1145/2508148.2485952
fatcat:nbaghtk2rvhfri6wkpah4t27ta
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading
2011
Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11
Large register files are common in highly multi-threaded architectures such as GPUs. ...
Circuit and architecture simulations of GPU benchmarks suites show significant savings in register file area (38%) and energy (68%) over the traditional SRAM implementation, with minimal (1.4%) performance ...
For GPGPU performance estimates and counting events for energy estimates, we modified GPGPU-Sim v2.1.1.b [1] to model the new scheduler and the hybrid register file. ...
doi:10.1145/2000064.2000094
dblp:conf/isca/YuHXWKS11
fatcat:qdj774nwnjb53nu25gamdfptta
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading
2011
SIGARCH Computer Architecture News
Large register files are common in highly multi-threaded architectures such as GPUs. ...
Circuit and architecture simulations of GPU benchmarks suites show significant savings in register file area (38%) and energy (68%) over the traditional SRAM implementation, with minimal (1.4%) performance ...
For GPGPU performance estimates and counting events for energy estimates, we modified GPGPU-Sim v2.1.1.b [1] to model the new scheduler and the hybrid register file. ...
doi:10.1145/2024723.2000094
fatcat:i2trkix455hyfa5l7bibjgfxz4
A STT-RAM-based low-power hybrid register file for GPGPUs
2015
Proceedings of the 52nd Annual Design Automation Conference on - DAC '15
Thus, hybrid memory system, which combines SRAM and the emerging non-volatile memory (NVM), has been employed for register file design on GPUs. ...
Due to high leakage power of SRAM, the register file consumes 20% to 40% of the total GPU power consumption. ...
Register File Cache. Gebhart has proposed register file cache on GPU to achieve more energy-efficiency [16] . ...
doi:10.1145/2744769.2744785
dblp:conf/dac/LiCSHLWY15
fatcat:t4yvgp6fgvdwdcbjqbez2neg4y
Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing
2014
2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)
Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues. ...
Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes. ...
After fetch and decode stages, the source operands for each instruction are read that can come from the register file or local memory. ...
doi:10.1109/dac.2014.6881522
fatcat:hw5e3uj4zfa77dhytiqboj5ceq
Energy-Efficient GPGPU Architectures via Collaborative Compilation and Memristive Memory-Based Computing
2014
Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14
Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues. ...
Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes. ...
After fetch and decode stages, the source operands for each instruction are read that can come from the register file or local memory. ...
doi:10.1145/2593069.2593132
dblp:conf/dac/RahimiGLCBG14
fatcat:ssnbhugtebe55bq4cb24wc2kei
GPUWattch
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. ...
General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. ...
from the register file. ...
doi:10.1145/2485922.2485964
dblp:conf/isca/LengHEGKAR13
fatcat:bkfi476bf5ed5lalls522mmd64
Vortex: OpenCL Compatible RISC-V GPGPU
[article]
2020
arXiv
pre-print
We evaluate this design using 15nm technology. We also show the performance and energy numbers of running them with a subset of benchmarks from the Rodinia Benchmark suite. ...
Vortex implements a SIMT architecture with a minimal ISA extension to RISC-V that enables the execution of OpenCL programs. We also extended OpenCL runtime framework to use the new ISA. ...
INTRODUCTION The emergence of data parallel architectures and general purpose graphics processing units (GPGPUs) have enabled new opportunities to address the power limitations and scalability of multi-core ...
arXiv:2002.12151v1
fatcat:uvuhcu7hbfbkneh3iph5v7cpvm
Cost-effective soft-error protection for SRAM-based structures in GPGPUs
2013
Proceedings of the ACM International Conference on Computing Frontiers - CF '13
We leverage the GPGPU microarchitecture characteristics, and propose energy-efficient protection mechanisms for two typical SRAM-based structures (i.e. instruction buffer and registers) which suffer high ...
registers. ...
In this paper, we explore reliable GPGPU microarchitecture designs to efficiently combat soft errors in light of small-scale processing technology. ...
doi:10.1145/2482767.2482804
dblp:conf/cf/TanLF13
fatcat:bft5p6jnszbejhpv6micpaqo6u
A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory
2013
2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems
It can enable further research on the design of GPU global memory for performance and energy tradeoffs. 1526-7539/13 $26.00 ...
SIMT cores execute distinct thread, operate on scalar registers and progress in lockstep. SIMT cores in an SM share the per-SM register file as well as the configurable shared memory and L1 cache. ...
Another purpose of this dualmode is to maintain the flexibility of simulating various emerging memory technologies. ...
doi:10.1109/mascots.2013.39
dblp:conf/mascots/WangJYSLV13
fatcat:muvotcf4k5gzvhu74apdomtvh4
DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files
2020
IEEE Access
Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. ...
The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. ...
DATA COMPRESSION IN GPU REGISTER FILES Zhang et al. implement a register file with spin-transfer torque magnetic RAM technology [50] . ...
doi:10.1109/access.2020.3025899
fatcat:bvwhzkmssjhodd7ji3zour3cc4
Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation
[article]
2020
arXiv
pre-print
As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 34%. ...
Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. ...
An example evaluation result shows that LTRF combined with register renumbering technique enables us to implement the main register file with emerging high-density high-latency memory technologies, enabling ...
arXiv:2010.09330v1
fatcat:cczrbwnshzggdhu5elvgycbr4q
« Previous
Showing results 1 — 15 out of 416 results