Filters








450 Hits in 4.5 sec

Data prefetch mechanisms

Steven P. Vanderwiel, David J. Lilja
2000 ACM Computing Surveys  
Rather than waiting for a cache miss to initiate a memory fetch, data prefetching anticipates such misses and issues a fetch to the memory system in advance of the actual memory reference.  ...  To be effective, prefetching must be implemented in such a way that prefetches are timely, useful, and introduce little overhead.  ...  Memory delays tend to be high in multiprocessors due to added contention for shared resources such as a shared bus and memory modules in a symmetric multiprocessor.  ... 
doi:10.1145/358923.358939 fatcat:2eiquo3icnalnpfej3badyzfjm

Sequential hardware prefetching in shared-memory multiprocessors

F. Dahlgren, M. Dubois, P. Stenstrom
1995 IEEE Transactions on Parallel and Distributed Systems  
To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software-and hardware-based data prefetching schemes have been proposed.  ...  Index Terms-Hardware-controlled prefetching, latency tolerance, memory consistency models, performance evaluation, sequential prefetching, shared-memory multiprocessors. 1045-9219/95$04.00 0 1995 IEEE  ...  ACKNOWLEDGMENTS We are indebted to our colleagues Mats Brorsson, Hikan Grahn, and Jonas Skeppstedt of Lund University and to the anonymous reviewers for helpful comments on earlier drafts of this paper  ... 
doi:10.1109/71.395402 fatcat:ag2u4cppb5fuzgtazzf7qg43sq

When caches aren't enough: data prefetching techniques

S.P. Vander Wiel, D.J. Lilja
1997 Computer  
24 Computer initiated prefetching, and prefetching via reference prediction tables.  ...  Acknowledgments This work was supported in part by the US Army Intelligence Center and Fort Huachuca under Contract No. DABT63-95-C-0127 and ARPA Order No. D 346r.  ...  VanderWiel's work was supported in part by an IBM Graduate Fellowship.  ... 
doi:10.1109/2.596622 fatcat:cuqo3rou2vgytmorxbbwevuube

Analyzing the impact of data prefetching on Chip MultiProcessors

Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami
2008 2008 13th Asia-Pacific Computer Systems Architecture Conference  
In Chip MultiProcessor (or CMP) chips, there are some shared resources such as L2 caches, buses, and so on.  ...  Data prefetching is a well known approach to compensating for poor memory performance, and has been employed in commercial processor chips.  ...  This work has been supported by the Grant-in-Aid for Creative Scientific Research (KAKENHI) No.19200004, and Matsushita Electric Industrial Co. Ltd.  ... 
doi:10.1109/apcsac.2008.4625454 dblp:conf/aPcsac/FukumotoMIM08 fatcat:5nnxyw2rgrda7cqoes7cxux3cm

Comparative Evaluation of Latency-Tolerating and -Reducing Techniques for Hardware-Only and Software-Only Directory Protocols

Håkan Grahn, Per Stenström
2000 Journal of Parallel and Distributed Computing  
We study in this paper how effective latency-tolerating and -reducing techniques are at cutting the memory access times for shared-memory multiprocessors with directory cache protocols managed by hardware  ...  , or does not change the number of protocol operations at the memory module.  ...  ACKNOWLEDGMENTS This research was supported in part by the Swedish National Board for Industrial and Technical Development (NUTEK) under Contract P855.  ... 
doi:10.1006/jpdc.1999.1606 fatcat:7ggydlihm5dftkum6prqzvyhfy

Interactions Between Compression and Prefetching in Chip Multiprocessors

Alaa R. Alameldeen, David A. Wood
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Stride-based hardware prefetching increases demand for these resources, causing contention that can degrade performance (up to 35% for one of our benchmarks).  ...  In chip multiprocessors (CMPs), multiple cores compete for shared resources such as on-chip caches and off-chip pin bandwidth.  ...  This work is supported in part by the National Science Foundation with grants CCR-0324878, EIA-0205286, and EIA-9971256, a Wisconsin Romnes Fellowship (Wood) and donations from IBM, Intel and Sun Microsystems  ... 
doi:10.1109/hpca.2007.346200 dblp:conf/hpca/AlameldeenW07 fatcat:ekcsuohp4ffhpppztvrptaua3a

Achieving high performance in bus-based shared-memory multiprocessors

A. Milenkovic
2000 IEEE Concurrency  
Caching in Distributed Systems B us-based shared-memory multiprocessors, or symmetric multiprocessors, are widely used in small-to medium-scale parallel machines of up to 30 processors.  ...  Although appropriate write buffers and relaxed memory consistency models can often hide In bus-based shared-memory multiprocessors, several techniques reduce cache misses and bus traffic, the key obstacles  ...  Much can be gained from exploring the effectiveness of these combined techniques both in state-of-the-art and future bus-based SMPs. based shared-memory multiprocessors, or symmetric multiprocessors, are  ... 
doi:10.1109/4434.865891 fatcat:e42udkwrhjdrfbi2rsjfhna5tm

The impact of parallel loop scheduling strategies on prefetching in a shared memory multiprocessor

D.J. Lilja
1994 IEEE Transactions on Parallel and Distributed Systems  
Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches  ...  The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy.  ...  The comments and suggestions provided by the anonymous reviewers helped to improve the quality of this paper, and their efforts are appreciated.  ... 
doi:10.1109/71.285604 fatcat:xgdco7egunbwfltpeu7xyd4kqu

Improving cache locality for thread-level speculation

S.L.C. Fung, J.G. Steffan
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
With the advent of chip-multiprocessors (CMPs), Thread-Level Speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual  ...  We break down the TLS cache locality problem into instruction and data cache, execution stages, and parallel access patterns, and propose methods to improve cache locality in each of these areas.  ...  This indicates that the read/write-based schemes were unable to reduce traffic enough for the strided prefetcher to be effective.  ... 
doi:10.1109/ipdps.2006.1639271 dblp:conf/ipps/FungS06 fatcat:gqshrhinrfds3m5pk7jtbvgqhq

A performance study of software and hardware data prefetching schemes

T.-F. Chen, J.-L. Baer
1994 SIGARCH Computer Architecture News  
In this paper, we evaluate approximations to these two schemes in the context of a shared-memory multiprocessor environment.  ...  Prefetching can be either hardware-based or software-directed or a combination of both.  ...  Tullsen and Eggers [18] have shown that the prefetching benefits are limited if memory bandwidth is a primary resource (e.g., in a bus-based shared memory multiprocessor).  ... 
doi:10.1145/192007.192030 fatcat:gzas56k445bvbp4pjhwl4nd3xy

Data prefetching for distributed shared memory systems

A.I.-C. Lai, Chin-Laung Lei
1996 Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences  
In distributed shared memory (DSM) systems, remote memory accesses take much longer than local ones and hence data prefetching should be effective for such systems.  ...  Our approach is to develop a new memory consistency semantic (MCS) model under which the prefetchable shared data objects, as well as the best moment to launch a prefetching operation, can be easily identified  ...  Acknowledgement The authors wish to appreciate the valuable and encouragement of the anonymous referees.  ... 
doi:10.1109/hicss.1996.495453 dblp:conf/hicss/LaiL96 fatcat:3kx7s55tf5hrpci3pzdmk2wua4

Taxonomy of Data Prefetching for Multicore Processors

Surendra Byna, Yong Chen, Xian-He Sun
2009 Journal of Computer Science and Technology  
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory.  ...  This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially  ...  [16] Hardware-controlled, run-ahead execution-based Event-based Main memory Shared L2 cache Processor side Kim et al. [9] Hardware-controlled, offline analysis Zilles et al.  ... 
doi:10.1007/s11390-009-9233-4 fatcat:gebfnjl6lveerjqunmitbus2u4

Performance evaluation and cost analysis of cache protocol extensions for shared-memory multiprocessors

F. Dahlgren, M. Dubois, P. Stenstrom
1998 IEEE transactions on computers  
We evaluate three extensions to directory-based cache coherence protocols in shared-memory multiprocessors.  ...  These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory sharing optimization, and a competitive-update mechanism.  ...  INTRODUCTION RIVATE caches in conjunction with directory-based, write-invalidate protocols are essential, but not sufficient, to cope with the high memory latencies of large-scale shared-memory multiprocessors  ... 
doi:10.1109/12.729785 fatcat:utokng3a3rf2fbrtykvmltudtq

Shared last-level TLBs for chip multiprocessors

Abhishek Bhattacharjee, Daniel Lustig, Margaret Martonosi
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
Because of their benefits for parallel and sequential applications, and their readily-implementable hardware, SLL TLBs hold great promise for CMPs.  ...  Furthermore, unlike these prefetchers, SLL TLBs can aid sequential applications, eliminating 35-95% of the TLB misses for various multiprogrammed combinations of sequential applications.  ...  This material is based upon work supported by the National Science Foundation under Grant No. CNS-0627650 and CNS-07205661.  ... 
doi:10.1109/hpca.2011.5749717 dblp:conf/hpca/BhattacharjeeLM11 fatcat:fzxttzbsxfhalfxn7ajxuawxmm

Effective hardware-based data prefetching for high-performance processors

Tien-Fu Chen, Jean-Loup Baer
1995 IEEE transactions on computers  
In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden  ...  The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use.  ...  ACKNOWLEDGMENTS This work was supported in part by NSF Grants CCR-91-01541 and CDA 91-23308 and by Apple Computer, Inc.  ... 
doi:10.1109/12.381947 fatcat:ddvrcqazkbdcjbpqc32oexhy5u
« Previous Showing results 1 — 15 out of 450 results