Filters








22 Hits in 4.8 sec

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.  ...  The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF).  ...  GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors.  ... 
doi:10.1145/2485922.2485952 dblp:conf/isca/JingSLGMGCL13 fatcat:niswlskhwbgxvc27zdnc5bnq54

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, Xiaoyao Liang
2013 SIGARCH Computer Architecture News  
The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.  ...  The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF).  ...  GPGPU Pipeline and the Banked Register File Architecture Typically, modern GPGPUs consist of many small cores called stream multiprocessors.  ... 
doi:10.1145/2508148.2485952 fatcat:nbaghtk2rvhfri6wkpah4t27ta

An eDRAM-Based Approximate Register File for GPUs

Donghwan Jeong, Young H. Oh, Jae W. Lee, Yongjun Park
2016 IEEE design & test  
This paper presents a high-density and energy-efficient approximate register file architecture based on embedded DRAM with lowered refresh rate to the low-order bits of register entries.  ...  With strong demands for energy-efficient performance the number of processing elements on a GPU will continue to scale up in the foreseeable future.  ...  Jing and X. Liang for their help with energy modeling of eDRAM-based register files.  ... 
doi:10.1109/mdat.2015.2500185 fatcat:fracsmy7g5gg7elxvcfpilesee

A Write-Aware STTRAM-Based Register File Architecture for GPGPU

Jue Wang, Yuan Xie
2015 ACM Journal on Emerging Technologies in Computing Systems  
A write-aware STTRAM-based register file architecture for GPGPU. ACM J.  ...  The massively parallel processing capacity of GPGPUs requires a large register file (RF), and its size keeps increasing to support more concurrent threads from generation to generation.  ...  BACKGROUND GPGPU and Register File Architecture GPGPUs usually consist of many small cores, and each core includes multiple data processing lanes (e.g., 32 in Fermi), an L1 data/instruction cache, a  ... 
doi:10.1145/2700230 fatcat:p3mhxhxeivd4lac6w7lozgoxa4

Haswell: The Fourth-Generation Intel Core Processor

Per Hammarlund, Alberto J. Martinez, Atiq A. Bajwa, David L. Hill, Erik Hallnor, Hong Jiang, Martin Dixon, Michael Derr, Mikal Hunsaker, Rajesh Kumar, Randy B. Osborne, Ravi Rajwar (+9 others)
2014 IEEE Micro  
team, and Patty Kummrow and the SDG Org for their significant contributions to the success of eDRAM and the Intel Iris Pro Graphics product.  ...  Disclaimer for Figures 6, 7 , and 9: Software and workloads used in performance tests may  ...  The media functions are woven into the graphics architecture and provide a scalable and programmable option for video encoding, decoding, and postprocessing.  ... 
doi:10.1109/mm.2014.10 fatcat:gc6fvakpm5brln3fdlq7pfjdve

FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads [article]

Jie Zhang, Myoungsoo Jung, Mahmut Taylan Kandemir
2019 arXiv   pre-print
reducing energy cost by 53%.  ...  To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and  ...  Wang and Xie [58] propose an STT-MRAM based GPU register file architecture.  ... 
arXiv:1903.01776v2 fatcat:7ahl2bxwp5axvinzfccixwbvqu

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Francisco Candel, Alejandro Valero, Salvador Petit, Julio Sahuquillo
2019 IEEE transactions on computers  
to 67% for a modern baseline GPU card, and from 32% to 118% for a larger GPU.  ...  To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases  ...  R (AEI/ERDF, EU), by the Universitat Politècnica de València under Grant SP20190169, and by the gaZ: T58 17R research group (Aragon Gov. and European ESF).  ... 
doi:10.1109/tc.2019.2907591 fatcat:wa5fxox64nculcccw736evzaru

Hardware Developments I - A Survey Of State-Of-The-Art Hardware And Software

Daniel Borgis, Liang Liang, Leon Petit, Michael Lysaght, Alan O'Cais
2016 Zenodo  
Review of actual hardware and software solutions and recommendations to software vendors  ...  of many-core processors, and presently evolving this architecture to address two significant Exascale computing challenges: highly scalable and efficient parallel I/O and system resiliency, 2) the Mont-Blanc  ...  Systems of this type are currently the most energy-efficient supercomputers according to the Green 500 List. For example JUQUEEN at JSC has an overall peak performance of 5.9 Petaflop.  ... 
doi:10.5281/zenodo.929532 fatcat:cpuc7mplurcqtkunarlbitvdqu

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges [article]

Chao Wang, Wenqi Lou, Lei Gong, Lihui Jin, Luchao Tan, Yahui Hu, Xi Li, Xuehai Zhou
2017 arXiv   pre-print
In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers.  ...  Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures.  ...  RESPARC [77] is a reconfigurable and energy-efficient architecture built-on Memristive Crossbar Arrays (MCA) for deep Spiking Neural Networks (SNNs), which utilizes the energy-efficiency of MCAs for  ... 
arXiv:1712.04771v1 fatcat:3lxv45qb4zaqpagtn3eghrmroe

RNNFast: An Accelerator for Recurrent Neural Networks Using Domain Wall Memory [article]

Mohammad Hossein Samavatian, Anys Bacha, Li Zhou, Radu Teodorescu
2018 arXiv   pre-print
The basic hardware primitive, the RNN processing element (PE) includes custom DWM-based multiplication, sigmoid and tanh units for high density and low-energy.  ...  RNNFast is very efficient and highly scalable, with flexible mapping of logical neurons to RNN hardware blocks.  ...  Neurocube [18] proposed a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with a logic tier for efficient neural computing.  ... 
arXiv:1812.07609v1 fatcat:2x2b3iqahrffrma5lc2ut2ivi4

A Survey of Near-Data Processing Architectures for Neural Networks [article]

Mehdi Hassanpour, Marc Riera, Antonio González
2021 arXiv   pre-print
Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both: High-density/low-energy storage  ...  In this paper, we present a survey of techniques for designing NDP architectures for NN.  ...  PE register file for local data reuse, and orchestrates the 2D Discussion.  ... 
arXiv:2112.12630v1 fatcat:drkwrztkazd3hlblxc7i4kgn2a

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

Ardavan Pedram, John D. McCalpin, Andreas Gerstlauer
2014 Journal of Signal Processing Systems  
The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM.  ...  Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor.  ...  Acknowledgements Authors wish to thank John Brunhaver for providing synthesis results for the raw components of the Transposer.  ... 
doi:10.1007/s11265-014-0896-x fatcat:ce5vw2a4dne5bmlkwbjuxdr75q

A Survey of Coarse-Grained Reconfigurable Architecture and Design

Leibo Liu, Jianfeng Zhu, Zhaoshi Li, Yanan Lu, Yangdong Deng, Jie Han, Shouyi Yin, Shaojun Wei
2019 ACM Computing Surveys  
and industry, because they offer the performance and energy efficiency of hardware with the flexibility of software.  ...  This article reviews the architecture and design of CGRAs thoroughly for the purpose of exploiting their full potential. First, a novel multidimensional taxonomy is proposed.  ...  First, CGRAs provide distributed interconnect, which is much more energy-efficient than the multiport register files in CPUs, GPUs, DSPs, and so on, resulting in a much smaller power overhead.  ... 
doi:10.1145/3357375 fatcat:pqi4d33i6bg45a6llswhwd44qi

High-performance computing systems: Status and outlook

J. J. Dongarra, A. J. van der Steen
2012 Acta Numerica  
In addition, we discuss the requirements for software that can take advantage of existing and future architectures.  ...  We review the different ways devised to speed them up, both with regard to components and their architecture.  ...  Dongarra and A. J. van der Steen For x86 instructions, 16 registers in a flat register file are present instead of the register stack typical of Intel architectures.  ... 
doi:10.1017/s0962492912000050 fatcat:n6yodkox5zb6xmlep6gvayud2m

Ultra-Low-Power Design and Hardware Security Using Emerging Technologies for Internet of Things

2017 Electronics  
Asynchronous circuits connect multiple components effectively across a large die for energy efficiency.  ...  Clearly, energy efficient mobile computing requires an ultra-low-power system design [18] . Achieving a very low average power for a wireless system typically makes extensive use of duty cycling.  ...  Acknowledgments: The authors wish to thank Yu Bi for his early contribution on silicon nanowire camouflage, KATAN light-weight encryption and correlation power analysis.  ... 
doi:10.3390/electronics6030067 fatcat:ozssarlb2ng5pcdsupo2hljyna
« Previous Showing results 1 — 15 out of 22 results