506 Hits in 5.5 sec

Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy

Niti Madan, Li Zhao, Naveen Muralimanohar, Aniruddha Udipi, Rajeev Balasubramonian, Ravishankar Iyer, Srihari Makineni, Donald Newell
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
In this paper, we postulate a 3D chip design that stacks SRAM and DRAM upon processing cores and employs OS-based page coloring to minimize horizontal communication of cache data.  ...  Cache hierarchies in future many-core processors are expected to grow in size and contribute a large fraction of overall processor power and performance.  ...  This results in a heterogeneous reconfigurable cache space, an artifact made possible by 3D die stacking.  ... 
doi:10.1109/hpca.2009.4798261 dblp:conf/hpca/MadanZMUBIMN09 fatcat:gdd3ryerbvdcvbhqnkrih7xu74

Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy

Arghavan Asad, Ozcan Ozturk, Mahmood Fathy, Mohammad Reza Jahed-Motlagh
2017 Microprocessors and microsystems  
The proposed approach dynamically (1) predicts the changing program behavior on each core; (2) re-determines frequency/voltage, cache capacity and technology in each level of the cache hierarchy based  ...  In the proposed architecture, for future chip-multiprocessors (CMPs), we exploit emerging technologies such as non-volatile memories (NVMs) and 3D techniques to combat dark silicon.  ...  Experimental results Experimental results for the proposed runtime optimization-based reconfiguration techniques In this sub-section, we evaluate a 3D CMP with stacked cache hierarchy as shown in Fig  ... 
doi:10.1016/j.micpro.2017.03.011 fatcat:f5mn5xmyxbeevjcaa4h7b26cdq

A Survey Of Techniques for Architecting DRAM Caches

Sparsh Mittal, Jeffrey S. Vetter
2016 IEEE Transactions on Parallel and Distributed Systems  
In this paper, we present a survey of techniques for architecting DRAM caches. Also, by classifying these techniques across several dimensions, we underscore their similarities and differences.  ...  In face of increasing cache capacity demands, researchers have now explored DRAM, which was conventionally considered synonymous to main memory, for designing large last level caches.  ...  They note that die-stacking enables the latency of a 3D DRAM cache to approach that of a 2D SRAM cache.  ... 
doi:10.1109/tpds.2015.2461155 fatcat:tqg5hgv64bfnbf6m5c6v4mh5sa

Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies

Po-An Tsai, Changping Chen, Daniel Sanchez
2018 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)  
Recent advances in die stacking have enabled near-data processing (NDP) systems that reduce data movement by placing cores close to memory.  ...  NDP cores enjoy cheaper memory accesses and are more area-constrained, so they use shallow cache hierarchies instead.  ...  This work was supported in part by NSF grant CAREER-1452994 and by a grant from the Qatar Computing Research Institute.  ... 
doi:10.1109/micro.2018.00058 dblp:conf/micro/TsaiCS18 fatcat:cnbwt3d23rdk7jxzf64mnzduum

Survey on Near-Data Processing: Applications and Architectures

Paulo Cesar Santos, Francis Birck Moreira, Aline Santana Cordeiro, Sairo Raoní Santos, Tiago Rodrigo Kepe, Luigi Carro, Marco Antonio Zanata Alves
2021 Journal of Integrated Circuits and Systems  
It occurred together with the appearance of 3D-stacked chips with logic and memory stacked layers.  ...  This survey presents a brief history of these accelerators, focusing on the applications domains migrated to near-data and the proposed architectures.  ...  ACKNOWLEDGEMENTS This work was partially supported by FAPERGS, CAPES, CNPq and Serrapilheira Institute (grant number Serra-1709-16621).  ... 
doi:10.29292/jics.v16i2.502 fatcat:3uiswd6z65djpjgvsxclutthxu

Integration Challenges and Tradeoffs for Terascale Architectures

Mani Azimi
2007 Intel Technology Journal  
Limited off-chip memory bandwidth requires innovations in the cache hierarchy, memory subsystem, and coherence protocol.  ...  a scalable and energy-efficient interconnect.  ...  to material presented in this paper.  ... 
doi:10.1535/itj.1103.01 fatcat:2foqn3s4nrexxezbqvkjpvxcqy

Performance and energy limits of a processor-integrated FFT accelerator

Tung Thanh-Hoang, Amirali Shambayati, Calvin Deutschbein, Henry Hoffmann, Andrew A. Chien
2014 2014 IEEE High Performance Extreme Computing Conference (HPEC)  
Such a step would increase the accelerator benefit at least 10-fold in energy for DDR3 and more than 100-fold in 3D-stacked memory system.  ...  Second, since memory performance is a critical constraint, we evaluate system configuration with 3D-stacked DRAM systems.  ...  The controller supports 8 DRAM devices (2 Gb/device) for a system capacity of 2GB and a peak memory bandwidth of 10.6 GB/s. 3D-Stacked Slow.  ... 
doi:10.1109/hpec.2014.7040951 dblp:conf/hpec/HoangSDHC14 fatcat:p2asjuoofzcslpcygzwlzusf5q

Monarch: A Durable Polymorphic Memory For Data Intensive Applications [article]

Ananth Krishna Prasad, Mahdi Nazm Bojnordi
2021 arXiv   pre-print
This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM.  ...  3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories.  ...  S-Cache, the reconfigurable 3D stacked CMOS, provides a faster access but significantly less capacity compared to the counterpart technologies.  ... 
arXiv:2108.08497v1 fatcat:ns5ex7ozmzgu7axwm75pbbwmra

Moving Processing to Data: On the Influence of Processing in Memory on Data Management [article]

Tobias Vincon, Andreas Koch, Ilia Petrov
2019 arXiv   pre-print
Processing-in-Memory is a sub-class of Near-Data processing that targets data processing directly within memory (DRAM) chips.  ...  Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units.  ...  Acknowledgements This work has been supported by the project grant HAW Promotion of the Ministry of culture youth and sports, state of Baden-Würrtemberg, Germany.  ... 
arXiv:1905.04767v1 fatcat:xksczeu5jjfxhd4bzvaqpuivna

Achieving energy efficiency by HW/SW co-design

Shekhar Borkar
2013 2013 Third Berkeley Symposium on Energy Efficient Electronic Systems (E3S)  
Jaussi, RESS004, Schrom et al, "A 100MHz 8-Phase Buck Converter Delivering 12A in 25mm2 Using Air-Core Inductors", APEC 2007 3D Integration provides best of both worlds 1Tb/s HMC DRAM Prototype • 3D  ...  of interconnects Sensors for introspection and circuits for energy efficiency O'Mahony et al, "A 47x10Gb/s 1.4mW/(Gb/s) Parallel Interface in 45nm CMOS ", ISSCC 2010; and J.  ... 
doi:10.1109/e3s.2013.6705856 fatcat:i4bz6azgdvcrzmrjm4cgl5vsdm

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

Sparsh Mittal
2016 ACM Computing Surveys  
In this paper, we present a survey of architectural and system-level techniques proposed for designing and managing AMPs.  ...  However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential  ...  This allows resource-sharing at fine-granularity to exploit both ILP and TLP and achieving smaller communication delays due to 3D stacking ].  ... 
doi:10.1145/2856125 fatcat:3hda47vtl5fznfvbskwcm2cbo4

Beyond the socket

Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, Aamer Jaleel, Alex Ramirez, David Nellans
2017 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17  
We show that application phase effects can be exploited allowing GPU sockets to dynamically optimize their individual interconnect and cache policies, minimizing the impact of NUMA effects.  ...  In this work we investigate multi-socket non-uniform memory access (NUMA) GPU designs and show that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance  ...  ACKNOWLEDGEMENTS We would like to thank anonymous reviewers and Steve Keckler for their help in improving this paper.  ... 
doi:10.1145/3123939.3124534 dblp:conf/micro/MilicVBAEJRN17 fatcat:nv5nbokyefhehgjgvsdlmbfadi

Near-Memory Computing: Past, Present, and Future [article]

Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra
2019 arXiv   pre-print
In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions.  ...  At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable.  ...  ACKNOWLEDGMENT This work was performed in the framework of Horizon 2020 program for the project "Near-Memory Computing (Ne-MeCo)" and is funded by European Commission under Marie Sklodowska-Curie Innovative  ... 
arXiv:1908.02640v1 fatcat:nvppe5zx2vbb5k4phfi4o5qoqq

Guest Editorial: IEEE Transactions on Computers Special Section on Emerging Non-Volatile Memory Technologies: From Devices to Architectures and Systems

Yuan-Hao Chang, Jingtong Hu, Mehdi B. Tahoori, Ronald F. DeMara
2019 IEEE transactions on computers  
., 3D NAND flash, STT-MRAM, ReRAM, PCM, and FeRAM) have attracted significant interest in recent years because of the fast-growing performance and capacity demands on memory and storage in the big data  ...  PCM) can be used as part of the CPU cache (resp. main memory) to provide a large memory capacity at low cost and low leakage power.  ... 
doi:10.1109/tc.2019.2923033 fatcat:qsyqmworwjghxpjo4xzafbc6da

A 3D SoC design for H.264 application with on-chip DRAM stacking

Tao Zhang, Kui Wang, Yi Feng, Yan Chen, Qun Li, Bing Shao, Jing Xie, Xiaodi Song, Lian Duan, Yuan Xie, Xu Cheng, Youn-Long Lin
2010 2010 IEEE International 3D Systems Integration Conference (3DIC)  
The stacked memory tiers leverage through-silicon-vias (TSVs) to communicate with logic tiers, and thus dramatically reduce the access latency and improve the data bandwidth without the constraint of I  ...  Three-dimensional (3D) on-chip memory stacking has been proposed as a promising solution to the "memory wall" challenge with the benefits of low access latency, high data bandwidth, and low power consumption  ...  Finally, we also thank other colleagues involved in this 3D IC MPW run for their constructive suggestions.  ... 
doi:10.1109/3dic.2010.5751446 dblp:conf/3dic/ZhangWFCLSXSDXCL10 fatcat:yd7dsapi3jc6vbh2mii4u7bc7m
« Previous Showing results 1 — 15 out of 506 results