Filters








168 Hits in 11.2 sec

Die-stacked DRAM caches for servers

Djordje Jevdjic, Stavros Volos, Babak Falsafi
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors.  ...  Cycleaccurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache.  ...  performance improvement with high-bandwidth and low-latency die-stacked DRAM.  ... 
doi:10.1145/2485922.2485957 dblp:conf/isca/JevdjicVF13 fatcat:bl2twnnncjhpvmby7khhoj3lwy

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Gabriel H. Loh, Mark D. Hill
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory (for all but some embedded systems).  ...  This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations.  ...  INTRODUCTION Die-stacking technologies provide a way to tightly integrate multiple disparate silicon die with high-bandwidth, low-latency interconnects.  ... 
doi:10.1145/2155620.2155673 dblp:conf/micro/LohH11 fatcat:favabxesmrgvldza6k6fq76as4

Fat Caches for Scale-Out Servers

Stavros Volos, Djordje Jevdjic, Babak Falsafi, Boris Grot
2017 IEEE Micro  
On-chip stacked DRAM caches have been proposed to provide the required bandwidth for manycore servers through caching of secondary data working sets.  ...  Emerging scale-out servers are characterized by massive memory footprints and bandwidth requirements.  ...  all systems due to its ability to provide high bandwidth at low latency.  ... 
doi:10.1109/mm.2017.32 fatcat:ueqitx4l4rc5zcu2d34v2wwszy

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, Babak Falsafi
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
Recent research advocates large die-stacked DRAM caches in manycore servers to break the memory latency and bandwidth wall.  ...  In doing so, the Footprint Cache achieves high hit rates with moderate on-chip tag storage and reasonable lookup latency.  ...  ACKNOWLEDGMENT The authors would like to thank Alexandros Daglis, Onur Kocberber, Nooshin Mirzadeh, Javier Picorel, Georgios Psaropoulos, Stavros Volos, and the anonymous reviewers for their insightful  ... 
doi:10.1109/micro.2014.51 dblp:conf/micro/JevdjicLKF14 fatcat:od75rrrhhjebrezmmblgv4tvna

Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems

Yuhwan Ro, Minchul Sung, Yongjun Park, Jung Ho Ahn
2017 IEICE Electronics Express  
This paper proposes an OBYST (On hit BYpass to STeal bandwidth) technique to improve memory bandwidth by selectively sending read requests that hit on DRAM cache to NVM instead of busy DRAM.  ...  Conventional solutions are reaching those limits; instead, DRAM/NVM hybrid main memory systems which consist of emerging Non-Volatile Memory for large capacity and DRAM last-level cache for high access  ...  For the memory systems in which processor-integrated die-stacked DRAM (on-chip DRAM) works as a cache of off-chip DRAM, SBD compares predicted latencies (the number of requests waiting for the same bank  ... 
doi:10.1587/elex.14.20170437 fatcat:4nk2wxep2bfwbgaa65npptp4cy

Thin servers with smart pipes

Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch
2013 SIGARCH Computer Architecture News  
Hence, we argue for an alternate architecture-Thin Servers with Smart Pipes (TSSP)-for cost-effective high-performance memcached deployment.  ...  Current deployments use commodity servers with high-end processors.  ...  The authors would like to thank Sai Rahul Chalamalasetti and Alvin AuYoung for their suggestions and contributions to the paper. This work was partially supported by NSF CCF-0815457.  ... 
doi:10.1145/2508148.2485926 fatcat:luyeegst5nfgdiek7bjvmvxxyu

A Survey Of Techniques for Architecting DRAM Caches

Sparsh Mittal, Jeffrey S. Vetter
2016 IEEE Transactions on Parallel and Distributed Systems  
In face of increasing cache capacity demands, researchers have now explored DRAM, which was conventionally considered synonymous to main memory, for designing large last level caches.  ...  Recent trends of increasing core-count and memory/bandwidth-wall have led to major overhauls in chip architecture.  ...  Hence, DRAM cache should be optimized first for latency and then for hit rate.  ... 
doi:10.1109/tpds.2015.2461155 fatcat:tqg5hgv64bfnbf6m5c6v4mh5sa

A fully associative, tagless DRAM cache

Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee
2015 SIGARCH Computer Architecture News  
By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high  ...  The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively.  ...  Acknowledgments We would like to thank Jung Ho Ahn and Young Hoon Son for their help with die-stacked DRAM modeling.  ... 
doi:10.1145/2872887.2750383 fatcat:td3aznb73zfanmsbh774wxndji

A fully associative, tagless DRAM cache

Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, Jae W. Lee
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high  ...  The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively.  ...  Acknowledgments We would like to thank Jung Ho Ahn and Young Hoon Son for their help with die-stacked DRAM modeling.  ... 
doi:10.1145/2749469.2750383 dblp:conf/isca/LeeKJYKJL15 fatcat:xtukcfzepnh65m6jx4jm2tweo4

Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories

Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, Gabriel H. Loh
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
Die-stacked DRAM is a technology that will soon be integrated in high-performance systems.  ...  Recent studies have focused on hardware caching techniques to make use of the stacked memory, but these approaches require complex changes to the processor and also cannot leverage the stacked memory to  ...  Jillella for their help with the tracing tools and for providing us with some of the memory traces used for HMA simulation.  ... 
doi:10.1109/hpca.2015.7056027 dblp:conf/hpca/MeswaniBRSIL15 fatcat:askhggskjvhctoyvll2kquxtmm

Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation

Tao Zhang, Ke Chen, Cong Xu, Guangyu Sun, Tao Wang, Yuan Xie
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
DRAM memory is a major contributor for the total power consumption in modern computing systems. Consequently, power reduction for DRAM memory is critical to improve system-level power efficiency.  ...  The experimental results show that Half-DRAM can achieve both significant performance improvement and power reduction, with negligible design overhead.  ...  Similarly, Reduced-latency DRAM (RLDRAM) [38] was introduced with a smaller bank size for low access latency.  ... 
doi:10.1109/isca.2014.6853217 dblp:conf/isca/ZhangCXSWX14 fatcat:5tuwxsky4bag3cw5efap2ke3ni

A Software-Managed Approach to Die-Stacked DRAM

Mark Oskin, Gabriel H. Loh
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
While much recent effort has focused on hardware-based techniques for using die-stacked memory (e.g., caching), in this paper we explore what it takes for a softwaredriven approach to be effective.  ...  Advances in die-stacking (3D) technology have enabled the tight integration of significant quantities of DRAM with high-performance computation logic.  ...  Performance is 48% slower with an OSmanaged die-stacked DRAM cache than having no stacked DRAM at all.  ... 
doi:10.1109/pact.2015.30 dblp:conf/IEEEpact/OskinL15 fatcat:fn4mpetyhfhp5fven24s3iort4

Twin-Load: Building a Scalable Memory System over the Non-Scalable Interface [article]

Zehan Cui, Tianyue Lu, Haiyang Pan, Sally A. Mckee, Mingyu Chen
2015 arXiv   pre-print
Synchronous DRAM protocols require data to be returned within a fixed latency, and thus memory extension methods over commodity DDRx interfaces fail to support scalable topologies.  ...  Commodity memory interfaces have difficulty in scaling memory capacity to meet the needs of modern multicore and big data systems.  ...  We have two processors, but use only one for program execution for all systems. 2 We vary the cut-off point to emulate different local:swapped ratios.  ... 
arXiv:1505.03476v1 fatcat:c24akpxcojdlvmqpehb7nrcmsy

High-performance DRAMs in workstation environments

V. Cuppu, B. Jacob, B. Davis, T. Mudge
2001 IEEE transactions on computers  
Our simulations reveal several things: 1) Current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; 2) bus transmission speed will soon become a primary  ...  These small-system organizations correspond to workstation-class computers and use only a handful of DRAM chips (~10, as opposed to~1 or~100).  ...  They would also like to thank Sally McKee for her detailed comments on and suggestions for the paper, as well as the anonymous reviewers of the earlier version of this paper that appeared in the Proceedings  ... 
doi:10.1109/12.966491 fatcat:r4glk3j7unerpkkmuetfwl5yeq

A scalable processing-in-memory accelerator for parallel graph processing

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, Kiyoung Choi
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important.  ...  The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept  ...  Acknowledgments We thank the anonymous reviewers for their valuable feedback.  ... 
doi:10.1145/2749469.2750386 dblp:conf/isca/AhnHYMC15 fatcat:yxk4mj22h5bkdozvbfxloj3qyi
« Previous Showing results 1 — 15 out of 168 results