Filters








284 Hits in 4.7 sec

Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

Brinda Ganesh, Aamer Jaleel, David Wang, Bruce Jacob
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
This paper examines how traditional DDRx based memory controller policies for scheduling and row buffer management perform on a Fully-Buffered DIMM memory architecture.  ...  The diminishing returns of such techniques have led to the proposal of an alternate architecture, the Fully-Buffered DIMM.  ...  Acknowledgements The authors would like to thank Sadagopan Srinivasan, Jessica Tseng, Nuengwong Tuaycharoen and Ankush Varma for their feedback and comments.  ... 
doi:10.1109/hpca.2007.346190 dblp:conf/hpca/GaneshJWJ07 fatcat:ibrfqbstbzbgfprmhze72kfuhu

BOOM

Doe Hyun Yoon, Jichuan Chang, Naveen Muralimanohar, Parthasarathy Ranganathan
2012 SIGARCH Computer Architecture News  
In this paper, we exploit the low-power nature of another high volume memory component-mobile DRAM-while improving its bandwidth and reliability shortcomings with a new DIMM architecture.  ...  We propose Buffered Output On Module (BOOM) that buffers the data outputs from multiple ranks of lowfrequency mobile DRAM devices, which in aggregation provide high bandwidth and achieve chipkill-correct  ...  We thank the anonymous reviewers for their comments and suggestions. This research was partially supported by the Department of Energy under Award Number DE -SC0005026.  ... 
doi:10.1145/2366231.2337163 fatcat:qgufjw7lazadbn7uqn4rogpzkq

BOOM: Enabling mobile memory based low-power server DIMMs

Doe Hyun Yoon, Jichuan Chang, Naveen Muralimanohar, Parthasarathy Ranganathan
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
In this paper, we exploit the low-power nature of another high volume memory component-mobile DRAM-while improving its bandwidth and reliability shortcomings with a new DIMM architecture.  ...  We propose Buffered Output On Module (BOOM) that buffers the data outputs from multiple ranks of lowfrequency mobile DRAM devices, which in aggregation provide high bandwidth and achieve chipkill-correct  ...  We thank the anonymous reviewers for their comments and suggestions. This research was partially supported by the Department of Energy under Award Number DE -SC0005026.  ... 
doi:10.1109/isca.2012.6237003 dblp:conf/isca/YoonCMR12 fatcat:d2qmjq2z3rg3zcdccewf5l22wi

ArchShield

Prashant J. Nair, Dae-Hyun Kim, Moinuddin K. Qureshi
2013 SIGARCH Computer Architecture News  
Both Fault Map and SWLR are integrated in reserved area in DRAM memory.  ...  DRAM scaling has been the prime driver for increasing the capacity of main memory system over the past three decades.  ...  Acknowledgments Thanks to Saibal Mukhopadhyay for discussions on DRAM scaling. Moinuddin Qureshi is supported by NetApp Faculty Fellowship and Intel Early Career Award.  ... 
doi:10.1145/2508148.2485929 fatcat:qxchykvokvf37grfmyeixane2e

Micro-pages

Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, Al Davis
2010 SIGARCH Computer Architecture News  
Thus, the colocation of chunks (from different OS pages) in a row-buffer will improve the overall utilization of the row buffer contents, and consequently reduce memory energy consumption and access time  ...  We explore these mechanisms and discuss the trade-offs involved along with energy and performance improvements from each scheme.  ...  We also thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/1735970.1736045 fatcat:r5i7xv4wz5a5zfwe6mapxpog34

GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing [article]

Zhe Zhou and Cong Li and Xuechao Wei and Guangyu Sun
2021 arXiv   pre-print
This paper presents GCNear, a hybrid architecture to tackle these challenges. Specifically, GCNear adopts a DIMM-based memory system to provide easy-to-scale memory capacity.  ...  Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory to buffer the intermediate data for back-propagation. 2) GCN training involves both memory-intensive data  ...  As we can see, the main components of NME incur moderate area (0.63 mm 2 ) and power (218.3mW) overhead, given that a single DIMM usually consumes several watts and the buffer chip takes up about 100 mm  ... 
arXiv:2111.00680v1 fatcat:3inhvousrvdkbcxtcrxfxiplwm

A survey of architectural techniques for DRAM power management

Sparsh Mittal
2012 International Journal of High Performance Systems Architecture  
Recent trends of CMOS technology scaling and wide-spread use of multicore processors have dramatically increased the power consumption of main memory.  ...  The focus of this paper is to survey several architectural techniques designed for improving power efficiency of main memory systems, specifically DRAM systems.  ...  Their technique adds a small buffer called "mini-rank buffer" between each DIMM and the memory bus.  ... 
doi:10.1504/ijhpsa.2012.050990 fatcat:r2ch5bvlx5gtznbaktep7zjvju

Architectural Techniques to Enable Reliable and Scalable Memory Systems [article]

Prashant J. Nair
2017 arXiv   pre-print
This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily.  ...  Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum.  ...  Each channel may be fully contained in each DRAM die in the stack. A complete set of TSVs and buffers connect each channel to the external interface.  ... 
arXiv:1704.03991v1 fatcat:e4i5pbtuujagnprene3wwv3un4

Micro-pages

Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, Al Davis
2010 Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10  
Thus, the colocation of chunks (from different OS pages) in a row-buffer will improve the overall utilization of the row buffer contents, and consequently reduce memory energy consumption and access time  ...  We explore these mechanisms and discuss the trade-offs involved along with energy and performance improvements from each scheme.  ...  We also thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/1736020.1736045 dblp:conf/asplos/SudanCNABD10 fatcat:gahfs7ticbdhbbnvq4ft2zofza

Thermal modeling and management of DRAM memory systems

Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Howard David, Zhao Zhang
2007 SIGARCH Computer Architecture News  
We propose two new schemes and evaluate their effectiveness on systems with multicore processors and Fully Buffered DIMM (FBDIMM) memories [11] .  ...  Fourth, we can extend our DRAM thermal model for fully buffered DIMM (FBDIMM) to other types of memory subsystems, such as DDR2 and DDR3 DRAM memory.  ... 
doi:10.1145/1273440.1250701 fatcat:n37ci7qjgnbfhc3fu7lcmoulwq

Thermal modeling and management of DRAM memory systems

Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Howard David, Zhao Zhang
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
We propose two new schemes and evaluate their effectiveness on systems with multicore processors and Fully Buffered DIMM (FBDIMM) memories [11] .  ...  Fourth, we can extend our DRAM thermal model for fully buffered DIMM (FBDIMM) to other types of memory subsystems, such as DDR2 and DDR3 DRAM memory.  ... 
doi:10.1145/1250662.1250701 dblp:conf/isca/LinZZDZ07 fatcat:ilaa36j4kjhwxe5veje5k4oafq

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture [article]

Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu
2021 arXiv   pre-print
For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy.  ...  First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new  ...  ACKNOWLEDGMENTS We thank UPMEM's Fabrice Devaux, Rémy Cimadomo, Romaric Jodin, and Vincent Palatin for their valuable support.  ... 
arXiv:2105.03814v5 fatcat:yvcv4rxh5vbthjszmmokviv3fa

The PowerNap Server Architecture

David Meisner, Brian T. Gold, Thomas F. Wenisch
2011 ACM Transactions on Computer Systems  
Based on the PowerNap concept, we develop requirements and outline mechanisms to eliminate idle power waste in enterprise blade servers.  ...  Rather than requiring fine-grained power-performance states and complex load-proportional operation from individual system components, PowerNap instead calls for minimizing idle power and transition time  ...  ACKNOWLEDGMENTS The authors would like to thank Partha Ranganathan and HP Labs for the real-world data center utilization traces, Andrew Caird and the staff at the Michigan Academic Computer Center for  ... 
doi:10.1145/1925109.1925112 fatcat:pku3zhsd65fydjdosq3e4pwuai

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency

Kunle Olukotun, Lance Hammond, James Laudon
2007 Synthesis Lectures on Computer Architecture  
A move from DDR2 SDRAM to Fully buffered DIMMs (FB-DIMM), a memory technology change already underway, is the key that will enable future Niagara chips to continue to increase the number of on-chip threads  ...  Given this memory system scaling and the fact that we are primarily interested in the performance effects of the processor core architecture differences, the results from this scaled-down simulation example  ...  At Afara Websystems he managed the architecture and performance team.  ... 
doi:10.2200/s00093ed1v01y200707cac003 fatcat:qyjilavdhfcmlnc46l5sxg7ssq

Characterizing and Understanding PDES Behavior on Tilera Architecture

Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, Nael Abu-Ghazaleh
2012 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation  
The emergence of manycore architectures with shifting balance between computation and communication overhead can have a tremendous impact on performance and scalability of fine-grained parallel applications  ...  Finally, we explore the issues of object placement and model partitioning on Tilera architecture.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air  ... 
doi:10.1109/pads.2012.10 dblp:conf/pads/JagtapBPA12 fatcat:nfrxqav7cbcctcrhclhhbxiu2y
« Previous Showing results 1 — 15 out of 284 results