Filters








1,324 Hits in 5.8 sec

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

Santhosh Srinath, Onur Mutlu, Hyesoon Kim, Yale N. Patt
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
and bandwidth impact of prefetching.  ...  Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance.  ...  We gratefully acknowledge the support of the Cockrell Foundation, Intel Corporation and the Advanced Technology Program of the Texas Higher Education Coordinating Board.  ... 
doi:10.1109/hpca.2007.346185 dblp:conf/hpca/SrinathMKP07 fatcat:freouwwyfvf6ljqss2fnj7guii

Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems

Eiman Ebrahimi, Onur Mutlu, Yale N. Patt
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
Unfortunately, many LDS prefetching techniques 1) generate a large number of useless prefetches, thereby degrading performance and bandwidth efficiency, 2) require significant hardware or storage cost,  ...  This paper proposes a low-cost hardware/software cooperative technique that enables bandwidth-efficient prefetching of linked data structures.  ...  We gratefully acknowledge the support of the Cockrell Foundation, Microsoft Research, and Intel Corporation.  ... 
doi:10.1109/hpca.2009.4798232 dblp:conf/hpca/EbrahimiMP09 fatcat:5nrfndrvubgdnjj7flsqf7s4fe

To hardware prefetch or not to prefetch?

Hui Kang, Jennifer L. Wong
2013 Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '13  
We examine a wide variety of benchmarks on three types of chip-multiprocessors (CMPs) to analyze the hardware prefetching performance.  ...  of hardware prefetching.  ...  Examples include feedback-directed prefetching [32] , dynamically coordinating the aggressiveness of multicore prefetching controllers [6] , prefetching-aware memory controllers [19, 20] , and hybrid  ... 
doi:10.1145/2451116.2451155 dblp:conf/asplos/KangW13 fatcat:ssjtqy2voveaph6yi32iqprhtq

Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers

Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, Rajeev Balasubramonian
2014 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)  
Sandbox Prefetching improves performance across the tested workloads by 47.6% compared to not using any prefetching, and by 18.7% compared to the Feedback Directed Prefetching technique.  ...  Overly aggressive prefetching can waste scarce resources such as memory bandwidth and cache capacity, limiting or even hurting performance.  ...  Acknowledgments We thank the anonymous reviewers for their many useful suggestions. This work was supported in part by NSF grant CNS-1302663.  ... 
doi:10.1109/hpca.2014.6835971 dblp:conf/hpca/PugsleyCWCSJLCB14 fatcat:2ijnlzqfb5ba3l4pfif6b7feam

To hardware prefetch or not to prefetch?

Hui Kang, Jennifer L. Wong
2013 SIGPLAN notices  
We examine a wide variety of benchmarks on three types of chip-multiprocessors (CMPs) to analyze the hardware prefetching performance.  ...  of hardware prefetching.  ...  Examples include feedback-directed prefetching [32] , dynamically coordinating the aggressiveness of multicore prefetching controllers [6] , prefetching-aware memory controllers [19, 20] , and hybrid  ... 
doi:10.1145/2499368.2451155 fatcat:6cm666no2jh45o5gucsy27sfrm

Band-Pass Prefetching

Aswinkumar Sridharan, Biswabandan Panda, Andre Seznec
2017 ACM Transactions on Architecture and Code Optimization (TACO)  
In multi-core systems, an application's prefetcher can interfere with the memory requests of other applications using the shared resources, such as last level cache and memory bandwidth.  ...  For a 16-core system, Band-pass prefetching requires only a modest hardware cost of 239 bytes.  ...  The authors thank the anonymous reviewers and the ALF/PACAP team for its valuable feedback on this work.  ... 
doi:10.1145/3090635 fatcat:aeact4t3dveetdx65yeuyfxz3e

Hardware prefetchers for emerging parallel applications

Biswabandan Panda, Shankar Balachandran
2012 Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12  
Reference [1] Srinath et. al., "Feedback Directed Prefetching: Improving the Performance and Bandwidth-efficiency of Hardware Prefetchers", HPCA 2007, pp 63-74.  ...  Table New Hardware GS and PCS prefetchers are shared by all the threads.  Both hide miss latency (not miss rate)GS and PCSIndian Institute of Technology Madras, India {biswa, shankar}@cse.iitm.ac.in  ... 
doi:10.1145/2370816.2370909 dblp:conf/IEEEpact/PandaB12 fatcat:ivv6bbhnofbzfdqvhme4uwxa7u

A Thread-Aware Adaptive Data Prefetcher

Jiyang Yu, Peng Liu
2014 2014 IEEE 32nd International Conference on Computer Design (ICCD)  
On a set of parallel benchmarks, our thread-aware data prefetching mechanisms improve the overall performance of 64-core system by 11% and reduce the energy-delay product by 13% over a multi-mode prefetch  ...  We compare our approach to the feedback directed prefetching (FDP) technique and find that it provides better performance on multi-core systems, while reducing the energy delay product.  ...  Huang, and Mei Yang for their comments on early versions of this paper, and the anonymous referees for their careful reviews and suggestions.  ... 
doi:10.1109/iccd.2014.6974694 dblp:conf/iccd/Yu014 fatcat:bnfaitdngfc2ddziemna3rjc4u

Prefetch-Aware DRAM Controllers

Chang Joo Lee, Onur Mutlu, Veynu Narasiman, Yale N. Patt
2008 2008 41st IEEE/ACM International Symposium on Microarchitecture  
Across a wide range of multiprogrammed SPEC CPU 2000/2006 workloads, it improves system performance by 8.2% on a 4-core system and by 9.9% on an 8-core system while reducing DRAM bandwidth consumption  ...  If prefetch requests are useless, treating prefetches and demands equally can lead to significant performance loss and extra bandwidth consumption.  ...  This mechanism efficiently reduces the buffer, bandwidth, and cache space resources consumed by useless prefetches, thereby improving both performance and bandwidth-efficiency. 4 .  ... 
doi:10.1109/micro.2008.4771791 dblp:conf/micro/LeeMNP08 fatcat:r2rmrj643jcwpkjknto2ixr2xa

Prefetch-Aware Memory Controllers

Chang Joo Lee, Onur Mutlu, Veynu Narasiman, Yale N. Patt
2011 IEEE transactions on computers  
Across a wide range of multiprogrammed SPEC CPU 2000/2006 workloads, it improves system performance by 8.2 and 9.9 percent on four and eight-core systems while reducing DRAM bandwidth consumption by 10.7  ...  However, none of these rigid policies result in the best performance because they do not take into account the usefulness of prefetches.  ...  We gratefully acknowledge the support of the Cockrell Foundation, Intel, AMD, and Gigascale Systems Research Center. This research was partially supported by NSF CAREER Award CCF-0953246.  ... 
doi:10.1109/tc.2010.214 fatcat:m5bbeqxcjjdfxhcetpr46kzrpa

Making Address-Correlated Prefetching Practical

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2010 IEEE Micro  
Prefetching improves throughput and response time by increasing memory-level parallelism 2,3 and remains an essential strategy to address the processormemory performance gap.  ...  Addresscorrelating prefetchers succinctly capture pointer-chasing relationships, and thus substantially improve the performance of pointer-intensive commercial workloads. 3 Pairwise-correlating prefetchers  ... 
doi:10.1109/mm.2010.21 fatcat:4nnthxry2rbdvejylzswlmbcja

Thread-Aware Adaptive Prefetcher on Multicore Systems

Peng Liu, Jiyang Yu, Michael C. Huang
2016 ACM Transactions on Architecture and Code Optimization (TACO)  
We compare our approach with the feedback directed prefetching technique and find that it provides 9% performance improvement on multicore systems, while saving the memory bandwidth consumption.  ...  On a set of multithreaded parallel benchmarks, our thread-aware data prefetching mechanism improves the overall performance of 64-core system by 13% over a multimode prefetch baseline system with two-level  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous referees for their detailed comments and valuable suggestions, which have helped us to improve the quality of the article.  ... 
doi:10.1145/2890505 fatcat:tj7cjeszpfh5rkyzaq77iofoyu

Focused prefetching

R. Manikantan, R. Govindarajan
2008 Proceedings of the 22nd annual international conference on Supercomputing - ICS '08  
Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches.  ...  We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads.  ...  Also seeing only a part of the training stream, will allow the prefetcher to use its hardware resources efficiently and improve the accuracy of the prefetches.  ... 
doi:10.1145/1375527.1375576 dblp:conf/ics/ManikantanG08 fatcat:ax4fzxee6jffhgynegcyqvuupe

Using Cacheline Reuse Characteristics for Prefetcher Throttling

Hidetsugu IRIE, Takefumi MIYOSHI, Goki HONJO, Kei HIRAKI, Tsutomu YOSHINAGA
2012 IEICE transactions on information and systems  
Prefetching can greatly improve cache performance, but it has the drawback of cache pollution, unless its aggressiveness is properly set.  ...  Exploiting the characteristics of cache line reuse, we propose Cache-Convection-Control-based Prefetch Optimization Plus (CCCPO+), which enhances the feedback algorithm of our previous CCCPO.  ...  This technique improves the prefetch efficiency by canceling the prefetch overruns that occur at the end of each stream. Srinath et al.  ... 
doi:10.1587/transinf.e95.d.2928 fatcat:an77wr6cwfh7noxfhfwzvlvafi

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, Richard Vuduc
2010 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture  
We show that adaptation reduces the negative effects of prefetching and can even improve performance.  ...  Overall, compared to the state-ofthe-art software and hardware prefetching, our MT-prefetching improves performance on average by 16% (software pref.) / 15% (hardware pref.) on our benchmarks.  ...  Kumar, Sangho Lee, Changhee Jung, Chang Joo Lee, Sunpyo Hong, Eiman Ebrahimi, and other HParch members and the anonymous reviewers for their suggestions and feedback on improving the paper.  ... 
doi:10.1109/micro.2010.44 dblp:conf/micro/LeeLKV10 fatcat:d5ly7nwmmvcfplcjtsrnjv53ym
« Previous Showing results 1 — 15 out of 1,324 results