A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs. ...
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). ...
GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors. ...
doi:10.1145/2485922.2485952
dblp:conf/isca/JingSLGMGCL13
fatcat:niswlskhwbgxvc27zdnc5bnq54
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
2013
SIGARCH Computer Architecture News
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs. ...
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). ...
GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors. ...
doi:10.1145/2508148.2485952
fatcat:nbaghtk2rvhfri6wkpah4t27ta
An eDRAM-Based Approximate Register File for GPUs
2016
IEEE design & test
This paper presents a high-density and energy-efficient approximate register file architecture based on embedded DRAM with lowered refresh rate to the low-order bits of register entries. ...
With strong demands for energy-efficient performance the number of processing elements on a GPU will continue to scale up in the foreseeable future. ...
Jing and X. Liang for their help with energy modeling of eDRAM-based register files. ...
doi:10.1109/mdat.2015.2500185
fatcat:fracsmy7g5gg7elxvcfpilesee
A Write-Aware STTRAM-Based Register File Architecture for GPGPU
2015
ACM Journal on Emerging Technologies in Computing Systems
A write-aware STTRAM-based register file architecture for GPGPU. ACM J. ...
The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation. ...
BACKGROUND
GPGPU and Register File Architecture GPGPUs usually consist of many small cores, and each core includes multiple data processing lanes (e.g., 32 in Fermi), an L1 data/instruction cache, a ...
doi:10.1145/2700230
fatcat:p3mhxhxeivd4lac6w7lozgoxa4
Haswell: The Fourth-Generation Intel Core Processor
2014
IEEE Micro
team, and Patty Kummrow and the SDG Org for their significant contributions to the success of eDRAM and the Intel Iris Pro Graphics product. ...
Disclaimer for Figures 6, 7 , and 9: Software and workloads used in performance tests may ...
The media functions are woven into the graphics architecture and provide a scalable and programmable option for video encoding, decoding, and postprocessing. ...
doi:10.1109/mm.2014.10
fatcat:gc6fvakpm5brln3fdlq7pfjdve
FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
[article]
2019
arXiv
pre-print
reducing energy cost by 53%. ...
To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and ...
Wang and Xie [58] propose an STT-MRAM based GPU register file architecture. ...
arXiv:1903.01776v2
fatcat:7ahl2bxwp5axvinzfccixwbvqu
Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
2019
IEEE transactions on computers
to 67% for a modern baseline GPU card, and from 32% to 118% for a larger GPU. ...
To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases ...
R (AEI/ERDF, EU), by the Universitat Politècnica de València under Grant SP20190169, and by the gaZ: T58 17R research group (Aragon Gov. and European ESF). ...
doi:10.1109/tc.2019.2907591
fatcat:wa5fxox64nculcccw736evzaru
Hardware Developments I - A Survey Of State-Of-The-Art Hardware And Software
2016
Zenodo
Review of actual hardware and software solutions and recommendations to software vendors ...
of many-core processors, and presently evolving this architecture to address two significant Exascale computing challenges: highly scalable and efficient parallel I/O and system resiliency, 2) the Mont-Blanc ...
Systems of this type are currently the most energy-efficient supercomputers according to the Green 500 List. For
example JUQUEEN at JSC has an overall peak performance of 5.9 Petaflop. ...
doi:10.5281/zenodo.929532
fatcat:cpuc7mplurcqtkunarlbitvdqu
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges
[article]
2017
arXiv
pre-print
In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers. ...
Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. ...
RESPARC [77] is a reconfigurable and energy-efficient architecture built-on Memristive Crossbar Arrays (MCA) for deep Spiking Neural Networks (SNNs), which utilizes the energy-efficiency of MCAs for ...
arXiv:1712.04771v1
fatcat:3lxv45qb4zaqpagtn3eghrmroe
RNNFast: An Accelerator for Recurrent Neural Networks Using Domain Wall Memory
[article]
2018
arXiv
pre-print
The basic hardware primitive, the RNN processing element (PE) includes custom DWM-based multiplication, sigmoid and tanh units for high density and low-energy. ...
RNNFast is very efficient and highly scalable, with flexible mapping of logical neurons to RNN hardware blocks. ...
Neurocube [18] proposed a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with a logic tier for efficient neural computing. ...
arXiv:1812.07609v1
fatcat:2x2b3iqahrffrma5lc2ut2ivi4
A Survey of Near-Data Processing Architectures for Neural Networks
[article]
2021
arXiv
pre-print
Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both: High-density/low-energy storage ...
In this paper, we present a survey of techniques for designing NDP architectures for NN. ...
PE register file for local data reuse, and orchestrates the 2D Discussion. ...
arXiv:2112.12630v1
fatcat:drkwrztkazd3hlblxc7i4kgn2a
A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores
2014
Journal of Signal Processing Systems
The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. ...
Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. ...
Acknowledgements Authors wish to thank John Brunhaver for providing synthesis results for the raw components of the Transposer. ...
doi:10.1007/s11265-014-0896-x
fatcat:ce5vw2a4dne5bmlkwbjuxdr75q
A Survey of Coarse-Grained Reconfigurable Architecture and Design
2019
ACM Computing Surveys
and industry, because they offer the performance and energy efficiency of hardware with the flexibility of software. ...
This article reviews the architecture and design of CGRAs thoroughly for the purpose of exploiting their full potential. First, a novel multidimensional taxonomy is proposed. ...
First, CGRAs provide distributed interconnect, which is much more energy-efficient than the multiport register files in CPUs, GPUs, DSPs, and so on, resulting in a much smaller power overhead. ...
doi:10.1145/3357375
fatcat:pqi4d33i6bg45a6llswhwd44qi
High-performance computing systems: Status and outlook
2012
Acta Numerica
In addition, we discuss the requirements for software that can take advantage of existing and future architectures. ...
We review the different ways devised to speed them up, both with regard to components and their architecture. ...
Dongarra and A. J. van der Steen
For x86 instructions, 16 registers in a flat register file are present instead of the register stack typical of Intel architectures. ...
doi:10.1017/s0962492912000050
fatcat:n6yodkox5zb6xmlep6gvayud2m
Ultra-Low-Power Design and Hardware Security Using Emerging Technologies for Internet of Things
2017
Electronics
Asynchronous circuits connect multiple components effectively across a large die for energy efficiency. ...
Clearly, energy efficient mobile computing requires an ultra-low-power system design [18] . Achieving a very low average power for a wireless system typically makes extensive use of duty cycling. ...
Acknowledgments: The authors wish to thank Yu Bi for his early contribution on silicon nanowire camouflage, KATAN light-weight encryption and correlation power analysis. ...
doi:10.3390/electronics6030067
fatcat:ozssarlb2ng5pcdsupo2hljyna
« Previous
Showing results 1 — 15 out of 22 results