A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is application/pdf
.
Filters
Die-stacked DRAM caches for servers
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. ...
Cycleaccurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. ...
performance improvement with high-bandwidth and low-latency die-stacked DRAM. ...
doi:10.1145/2485922.2485957
dblp:conf/isca/JevdjicVF13
fatcat:bl2twnnncjhpvmby7khhoj3lwy
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
2011
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11
A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory (for all but some embedded systems). ...
This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations. ...
INTRODUCTION Die-stacking technologies provide a way to tightly integrate multiple disparate silicon die with high-bandwidth, low-latency interconnects. ...
doi:10.1145/2155620.2155673
dblp:conf/micro/LohH11
fatcat:favabxesmrgvldza6k6fq76as4
Fat Caches for Scale-Out Servers
2017
IEEE Micro
On-chip stacked DRAM caches have been proposed to provide the required bandwidth for manycore servers through caching of secondary data working sets. ...
Emerging scale-out servers are characterized by massive memory footprints and bandwidth requirements. ...
all systems due to its ability to provide high bandwidth at low latency. ...
doi:10.1109/mm.2017.32
fatcat:ueqitx4l4rc5zcu2d34v2wwszy
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
2014
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
Recent research advocates large die-stacked DRAM caches in manycore servers to break the memory latency and bandwidth wall. ...
In doing so, the Footprint Cache achieves high hit rates with moderate on-chip tag storage and reasonable lookup latency. ...
ACKNOWLEDGMENT The authors would like to thank Alexandros Daglis, Onur Kocberber, Nooshin Mirzadeh, Javier Picorel, Georgios Psaropoulos, Stavros Volos, and the anonymous reviewers for their insightful ...
doi:10.1109/micro.2014.51
dblp:conf/micro/JevdjicLKF14
fatcat:od75rrrhhjebrezmmblgv4tvna
Selective DRAM cache bypassing for improving bandwidth on DRAM/NVM hybrid main memory systems
2017
IEICE Electronics Express
This paper proposes an OBYST (On hit BYpass to STeal bandwidth) technique to improve memory bandwidth by selectively sending read requests that hit on DRAM cache to NVM instead of busy DRAM. ...
Conventional solutions are reaching those limits; instead, DRAM/NVM hybrid main memory systems which consist of emerging Non-Volatile Memory for large capacity and DRAM last-level cache for high access ...
For the memory systems in which processor-integrated die-stacked DRAM (on-chip DRAM) works as a cache of off-chip DRAM, SBD compares predicted latencies (the number of requests waiting for the same bank ...
doi:10.1587/elex.14.20170437
fatcat:4nk2wxep2bfwbgaa65npptp4cy
Thin servers with smart pipes
2013
SIGARCH Computer Architecture News
Hence, we argue for an alternate architecture-Thin Servers with Smart Pipes (TSSP)-for cost-effective high-performance memcached deployment. ...
Current deployments use commodity servers with high-end processors. ...
The authors would like to thank Sai Rahul Chalamalasetti and Alvin AuYoung for their suggestions and contributions to the paper. This work was partially supported by NSF CCF-0815457. ...
doi:10.1145/2508148.2485926
fatcat:luyeegst5nfgdiek7bjvmvxxyu
A Survey Of Techniques for Architecting DRAM Caches
2016
IEEE Transactions on Parallel and Distributed Systems
In face of increasing cache capacity demands, researchers have now explored DRAM, which was conventionally considered synonymous to main memory, for designing large last level caches. ...
Recent trends of increasing core-count and memory/bandwidth-wall have led to major overhauls in chip architecture. ...
Hence, DRAM cache should be optimized first for latency and then for hit rate. ...
doi:10.1109/tpds.2015.2461155
fatcat:tqg5hgv64bfnbf6m5c6v4mh5sa
A fully associative, tagless DRAM cache
2015
SIGARCH Computer Architecture News
By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high ...
The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. ...
Acknowledgments We would like to thank Jung Ho Ahn and Young Hoon Son for their help with die-stacked DRAM modeling. ...
doi:10.1145/2872887.2750383
fatcat:td3aznb73zfanmsbh774wxndji
A fully associative, tagless DRAM cache
2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15
By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high ...
The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. ...
Acknowledgments We would like to thank Jung Ho Ahn and Young Hoon Son for their help with die-stacked DRAM modeling. ...
doi:10.1145/2749469.2750383
dblp:conf/isca/LeeKJYKJL15
fatcat:xtukcfzepnh65m6jx4jm2tweo4
Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories
2015
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)
Die-stacked DRAM is a technology that will soon be integrated in high-performance systems. ...
Recent studies have focused on hardware caching techniques to make use of the stacked memory, but these approaches require complex changes to the processor and also cannot leverage the stacked memory to ...
Jillella for their help with the tracing tools and for providing us with some of the memory traces used for HMA simulation. ...
doi:10.1109/hpca.2015.7056027
dblp:conf/hpca/MeswaniBRSIL15
fatcat:askhggskjvhctoyvll2kquxtmm
Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation
2014
2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
DRAM memory is a major contributor for the total power consumption in modern computing systems. Consequently, power reduction for DRAM memory is critical to improve system-level power efficiency. ...
The experimental results show that Half-DRAM can achieve both significant performance improvement and power reduction, with negligible design overhead. ...
Similarly, Reduced-latency DRAM (RLDRAM) [38] was introduced with a smaller bank size for low access latency. ...
doi:10.1109/isca.2014.6853217
dblp:conf/isca/ZhangCXSWX14
fatcat:5tuwxsky4bag3cw5efap2ke3ni
A Software-Managed Approach to Die-Stacked DRAM
2015
2015 International Conference on Parallel Architecture and Compilation (PACT)
While much recent effort has focused on hardware-based techniques for using die-stacked memory (e.g., caching), in this paper we explore what it takes for a softwaredriven approach to be effective. ...
Advances in die-stacking (3D) technology have enabled the tight integration of significant quantities of DRAM with high-performance computation logic. ...
Performance is 48% slower with an OSmanaged die-stacked DRAM cache than having no stacked DRAM at all. ...
doi:10.1109/pact.2015.30
dblp:conf/IEEEpact/OskinL15
fatcat:fn4mpetyhfhp5fven24s3iort4
Twin-Load: Building a Scalable Memory System over the Non-Scalable Interface
[article]
2015
arXiv
pre-print
Synchronous DRAM protocols require data to be returned within a fixed latency, and thus memory extension methods over commodity DDRx interfaces fail to support scalable topologies. ...
Commodity memory interfaces have difficulty in scaling memory capacity to meet the needs of modern multicore and big data systems. ...
We have two processors, but use only one for program execution for all systems. 2 We vary the cut-off point to emulate different local:swapped ratios. ...
arXiv:1505.03476v1
fatcat:c24akpxcojdlvmqpehb7nrcmsy
High-performance DRAMs in workstation environments
2001
IEEE transactions on computers
Our simulations reveal several things: 1) Current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; 2) bus transmission speed will soon become a primary ...
These small-system organizations correspond to workstation-class computers and use only a handful of DRAM chips (~10, as opposed to~1 or~100). ...
They would also like to thank Sally McKee for her detailed comments on and suggestions for the paper, as well as the anonymous reviewers of the earlier version of this paper that appeared in the Proceedings ...
doi:10.1109/12.966491
fatcat:r4glk3j7unerpkkmuetfwl5yeq
A scalable processing-in-memory accelerator for parallel graph processing
2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15
The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. ...
The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept ...
Acknowledgments We thank the anonymous reviewers for their valuable feedback. ...
doi:10.1145/2749469.2750386
dblp:conf/isca/AhnHYMC15
fatcat:yxk4mj22h5bkdozvbfxloj3qyi
« Previous
Showing results 1 — 15 out of 168 results