Filters








233 Hits in 3.1 sec

Access order and effective bandwidth for streams on a Direct Rambus memory

S.I. Hong, S.A. McKee, M.H. Salinas, R.H. Klenke, J.H. Aylor, W.A. Wulf
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
Direct RDRAM device, and that accessing streams via a streaming mechanism with a simple access ordering scheme can improve performance by factors of 1.18 to 2.25.  ...  For our benchmarks, we find that accessing unit-stride streams in cacheline bursts in the natural order of the computation exploits from 44-76% of the peak bandwidth of a memory system composed of a single  ...  Thanks also go to the Oregon Graduate Institute Department of Computer Science and Engineering for providing resources to conduct a portion of this work while the second author was in residence as a visiting  ... 
doi:10.1109/hpca.1999.744337 dblp:conf/hpca/HongMSKAW99 fatcat:tvza4ju6pngd7ieex4txtanleu

A performance comparison of contemporary DRAM architectures

Vinodh Cuppu, Bruce Jacob, Brian Davis, Trevor Mudge
1999 SIGARCH Computer Architecture News  
These small-system organizations correspond to workstation-class computers and use on the order of 10 DRAM chips.  ...  The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Synchronous Link, Rambus, and Direct Rambus designs.  ...  We would also like to thank Sally McKee for her detailed comments on and suggestions for the paper, as well as the anonymous reviewers of the first draft.  ... 
doi:10.1145/307338.300998 fatcat:mbcw6ph6mnb65b27khnsaca4zi

High-performance DRAMs in workstation environments

V. Cuppu, B. Jacob, B. Davis, T. Mudge
2001 IEEE transactions on computers  
The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Double Data Rate, Synchronous Link, Rambus, and Direct Rambus designs.  ...  for low-and medium-speed CPUs (1GHz and below); and 5) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality  ...  They would also like to thank Sally McKee for her detailed comments on and suggestions for the paper, as well as the anonymous reviewers of the earlier version of this paper that appeared in the Proceedings  ... 
doi:10.1109/12.966491 fatcat:r4glk3j7unerpkkmuetfwl5yeq

Hardware-only stream prefetching and dynamic access ordering

Chengqiang Zhang, Sally A. McKee
2000 Proceedings of the 14th international conference on Supercomputing - ICS '00  
This study builds on these results, combining a stride-based reference prediction table, a mechanism that prefetches L2 cache lines, and a memory controller that dynamically schedules accesses to a Direct  ...  Rambus memory subsystem.  ...  The authors thank Steve Reinhardt and Wei-fen Lin for providing the initial Rambus model. Discussions with John Carter, Lixin Zhang, and Mike Parker helped shape this study.  ... 
doi:10.1145/335231.335247 dblp:conf/ics/ZhangM00 fatcat:ipbfsbscoff3rdfydk7swwgisq

Designing a modern memory hierarchy with hardware prefetching

Wei-Fen Lin, S.K. Reinhardt, D. Burger
2001 IEEE transactions on computers  
We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated onemegabyte level-two cache, a processor still spends over half its time stalling  ...  Using those results, we evaluate a hardware prefetch unit integrated with the L2 cache and memory controllers.  ...  a Direct Rambus memory system with four 1.6GB/s channels  ... 
doi:10.1109/12.966495 fatcat:xsfr5snle5ho3liqmt3i7v2p24

Dynamic access ordering for streamed computations

D.A.B. Weikle, S.I. Hong, M.H. Salinas, R.H. Klenke, J.H. Aylor, W.A. Wulf, S.A. McKee
2000 IEEE transactions on computers  
We describe a Stream Memory Controller (SMC) system that combines compile-time detection of streams with execution-time selection of the access order and issue.  ...  The SMC effectively prefetches read-streams, buffers write-streams, and reorders the accesses to exploit the existing memory bandwidth as much as possible.  ...  ACKNOWLEDGMENTS This work was supported in part by US National Science Foundation awards MIP-9114110 and MIP-9307626 and by a grant from Intel Corporation.  ... 
doi:10.1109/12.895941 fatcat:gj5dhemidrgv3ewc5nwlht2vzy

Another Trip to the Wall

Milan Radulovic, Darko Zivanovic, Daniel Ruiz, Bronis R. de Supinski, Sally A. McKee, Petar Radojković, Eduard Ayguadé
2015 Proceedings of the 2015 International Symposium on Memory Systems - MEMSYS '15  
Here we summarize our analysis and expectations of how such 3D-stacked DRAMs will affect the memory wall for a set of representative HPC applications.  ...  First defined two decades ago, the memory wall remains a fundamental limitation to system performance.  ...  We repeat the experiments with hyperthreading enabled in order to understand the impact on effective memory bandwidth 3 .  ... 
doi:10.1145/2818950.2818955 dblp:conf/memsys/RadulovicZRSMRA15 fatcat:zkurubdr6naz5cdg7thwmqle7m

A case for studying DRAM issues at the system level

B. Jacob
2003 IEEE Micro  
It refers to data access granularity; for example, direct Rambus has a packetized DRAM interface, rather than burst-mode DRAMs such as SDRAM or enhanced SDRAM (ESDRAM).  ...  We based the framework on a model that defines a continuum of design choices covering most contemporary DRAM architectures, such as Rambus, Direct Rambus, SDRAM, and DDR SDRAM.  ...  s 1 c h a n x 2 b y te s 2 c h a n x 1 b y te 1 c h a n x 1 b y te results with significantly less cost and engineering effort.  ... 
doi:10.1109/mm.2003.1225969 fatcat:iogn63ygpjakdjiph7bmrkog2q

Performance of the Complex Streamed Instruction Set on Image Processing Kernels [chapter]

Dmitri Tcheressiz, Ben Juurlink, Stamatis Vassiliadis, Harry Wijshoff
2001 Lecture Notes in Computer Science  
CSI instructions operate on two-dimensional data streams in a SIMD fashion and are able to process streams of arbitrary length.  ...  We also analyze the scalability of VIS and CSI with respect to memory bandwidth. The results show that CSI scales much better than VIS with increasing bandwidth.  ...  Contemporary PCs have a memory bandwidth of 0.8 GB/s using a 64-bit wide bus and DDR SDRAM [14] and Direct Rambus [5] already provide 1.6 GB/s of bandwidth.  ... 
doi:10.1007/3-540-44681-8_97 fatcat:f7iql353ibhcvibkxf6jsgtor4

Concurrency, latency, or system overhead

Vinodh Cuppu, Bruce Jacob
2001 Proceedings of the 28th annual international symposium on Computer architecture - ISCA '01  
In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged Direct Rambus organization (32 data bits) with  ...  Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning  ...  Bruce Jacob is supported in part by these awards, NSF grant EIA-0000439, and by Compaq and IBM.  ... 
doi:10.1145/379240.379252 dblp:conf/isca/CuppuJ01 fatcat:fwmahtqk7rdvfjh3wddzfjq7om

International Symposium on Computer Architecture (ISCA 2004)

Wolfgang Karl
2004 it - Information Technology  
Bruce Jacob is supported in part by these awards, NSF grant EIA-0000439, and by Compaq and IBM.  ...  ACKNOWLEDGMENTS Vinodh Cuppu is supported in part by NSF grant EIA-9806645 and NSF CAREER Award CCR-9983618.  ...  Direct Rambus uses a 400 Mhz 3-byte channel (2 for data, 1 for addresses/commands). Direct Rambus parts transfer on both clock edges, implying a maximum bandwidth of 1.6 Gbytes/s.  ... 
doi:10.1524/itit.46.2.103.29083 fatcat:d77r2ylbvve7tniqnodko363ye

Tarantula

Roger Espasa, Matthew Mattina, André Seznec, Federico Ardanaz, Joel Emer, Stephen Felix, Julio Gago, Roger Gramunt, Isaac Hernandez, Toni Juan, Geoff Lowney
2002 SIGARCH Computer Architecture News  
Salient features of the architecture and implementation are: (1) it fully integrates into a virtual-memory cache-coherent system without changes to its coherency protocol (2) provides high bandwidth for  ...  The whole chip is backed by a memory controller capable of delivering over 64 GBytes/s of raw bandwidth.  ...  We would like to give very special thanks to the ASIM team, for a terrific modeling environment that made this project so much easier: J.  ... 
doi:10.1145/545214.545247 fatcat:udqrrag6ybbnfbst5bup5755gy

Concurrency, latency, or system overhead

Vinodh Cuppu, Bruce Jacob
2001 SIGARCH Computer Architecture News  
In this design space, we see a wide variation in application execution times; for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged Direct Rambus organization (32 data bits) with  ...  Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning  ...  Bruce Jacob is supported in part by these awards, NSF grant EIA-0000439, and by Compaq and IBM.  ... 
doi:10.1145/384285.379252 fatcat:bsx4e3mj2vaetlwigulznzgmd4

The New DRAM Interfaces: SDRAM, RDRAM and Variants [chapter]

Brian Davis, Bruce Jacob, Trevor Mudge
2000 Lecture Notes in Computer Science  
For the past two decades, developments in DRAM technology, the primary technology for the main memory of computers, have been directed towards increasing density.  ...  As a result 256 M-bit memory chips are now commonplace, and we can expect to see systems shipping in volume with 1 G-bit memory chips within the next two years.  ...  Direct Rambus (DRDRAM) Direct Rambus DRAM (DRDRAM) devices use a 400 Mhz 3-byte-wide channel (2 for data, 1 for addresses/commands).  ... 
doi:10.1007/3-540-39999-2_3 fatcat:jvb6az66ejdwbbkif7vemplw5i

Limited bandwidth to affect processor design

D. Burger, J.R. Goodman, A. Kagi
1997 IEEE Micro  
The range of tech-An execution-driven simulation measures the time that several SPEC95 benchmarks spend stalled for memory latency, limited-memory bandwidth, and computing.  ...  These factors will force architectural and system-level change.  ...  Acknowledgments We thank Ken Sakamura for including this revised work in his special issue, and Steve Diamond and Marie English at IEEE Micro for giving us the opportunity to submit our work.  ... 
doi:10.1109/40.641597 fatcat:nfsuhud2yzefrk7at6kxlwvbla
« Previous Showing results 1 — 15 out of 233 results