Filters








588 Hits in 5.5 sec

Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications

L. Spracklen, Yuan Chou, S.G. Abraham
11th International Symposium on High-Performance Computer Architecture  
a prefetch is issued for the potential target -Prefetches are also issued for sequential lines following the target (up to N)Concluding Remarks• Modern commercial applications have high instruction miss  ...  Proposed the Discontinuity prefetcher which reduces the miss rate by ~90%• Need to consider the pollution effects of aggressive prefetchers (especially in CMPs)• Accelerated commercial apps by up to 38%  ... 
doi:10.1109/hpca.2005.13 dblp:conf/hpca/SpracklenCA05 fatcat:dv3b3iofindrlauptuh7jboy6i

Spatio-temporal memory streaming

Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi
2009 SIGARCH Computer Architecture News  
In this paper, we propose Spatio-Temporal Memory Streaming (STeMS) to exploit the synergy between spatial and temporal streaming.  ...  Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their feedback on this paper and members of the SimFlex team at Carnegie Mellon for contributions to our simulation infrastructure  ... 
doi:10.1145/1555815.1555766 fatcat:2a4fvni5yvgoraeiuhonszveju

Spatio-temporal memory streaming

Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
In this paper, we propose Spatio-Temporal Memory Streaming (STeMS) to exploit the synergy between spatial and temporal streaming.  ...  Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their feedback on this paper and members of the SimFlex team at Carnegie Mellon for contributions to our simulation infrastructure  ... 
doi:10.1145/1555754.1555766 dblp:conf/isca/SomogyiWAF09 fatcat:tcclgwppjnf2fga3lxnoumji2a

Making Address-Correlated Prefetching Practical

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2010 IEEE Micro  
To illustrate why memory accesses in commercial applications often occur in temporal streams, we present two motivating examples taken from actual behaviors we've observed in our commercial application  ...  Figure 1 illustrates the potential of address-correlated prefetching across a range of server and scientific applications running on a four-core chip multiprocessor. 12 The figure shows the fraction  ... 
doi:10.1109/mm.2010.21 fatcat:4nnthxry2rbdvejylzswlmbcja

A Case for Specialized Processors for Scale-Out Workloads

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, Babak Falsafi
2014 IEEE Micro  
We thank the DSLab for their assistance with SAT Solver, and Aamer Jaleel and Carole Jean-Wu for their assistance with understanding the Intel prefetchers and configuration.  ...  We thank the PARSA lab for continual support and feedback, in particular Pejman Lotfi-Kamran and Javier Picorel for their assistance with the SPEC-web09 and SAT Solver benchmarks.  ...  His research interests include multiprocessor cache coherence and memory system design for commercial workloads.  ... 
doi:10.1109/mm.2014.41 fatcat:gowz5x2fjvbobhcm2p4qsy2nlu

Practical off-chip meta-data for temporal memory streaming

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
For maximum effectiveness, STMS needs 64MB of meta-data in main memory, a small fraction of memory in servers. • Latency efficiency.  ...  We evaluate our practical design, Sampled Temporal Memory Streaming (STMS), through cycle-accurate full-system simulation of scientific and commercial multiprocessor workloads, to demonstrate: • Performance  ...  Acknowledgements The authors would like to thank Brian Gold and the anonymous reviewers for their feedback.  ... 
doi:10.1109/hpca.2009.4798239 dblp:conf/hpca/WenischFAFM09 fatcat:qzies3ngwjaetpsnel7mbbclkq

Teaching old caches new tricks: RegionTracker and predictor virtualization

Ioana Burcea, Jason Zebchuk, Andreas Moshovos
2009 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing  
On-chip last-level caches are increasing to tens of megabytes to accommodate applications with large memory footprints and to compensate for high memory latencies and limited off-chip bandwidth.  ...  storing program data and instructions.  ...  the L2 is not detrimental to overall performance even for applications with large instruction and data footprints that tax the on-chip memory hierarchy.  ... 
doi:10.1109/pacrim.2009.5291238 fatcat:f4abtxobmfampbndvamf2vvdoq

Spatial Memory Streaming

Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, Andreas Moshovos
2006 SIGARCH Computer Architecture News  
Prior research indicates that there is much spatial variation in applications' memory access patterns.  ...  Increasing the block size would not only prohibitively increase pin and interconnect bandwidth demands, but also increase the likelihood of false sharing in sharedmemory multiprocessors.  ...  Using cycle-accurate full-system multiprocessor simulation running commercial and scientific applications, we demonstrated that SMS can on average predict 58% of L1 and 65% of off-chip misses, for an average  ... 
doi:10.1145/1150019.1136508 fatcat:fdm7dfccavfbppqys6ydnyvcxa

Temporal streams in commercial server applications

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2008 2008 IEEE International Symposium on Workload Characterization  
Commercial server applications remain memory bound on modern multiprocessor systems because of their large data footprints, frequent sharing, complex non-strided access patterns, and long chains of dependant  ...  In this paper, we perform an information-theoretic analysis of miss traces from single-chip and multi-chip multiprocessors to identify recurring temporal streams in web serving, online transaction processing  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their feedback on drafts of this paper.  ... 
doi:10.1109/iiswc.2008.4636095 dblp:conf/iiswc/WenischFAFM08 fatcat:lrbs7eel3naphk2o2vtae5etbq

Phantom-BTB

Ioana Burcea, Andreas Moshovos
2009 Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09  
Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage.  ...  Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance.  ...  This research was supported in part by an NSERC Discovery Grant, a Canada Foundation for Innovation New Opportunities Infrastructure Grant, and an Intel Research Council grant.  ... 
doi:10.1145/1508244.1508281 dblp:conf/asplos/BurceaM09 fatcat:sek45p4vzzdrrgx5wuwk7mu3zq

Phantom-BTB

Ioana Burcea, Andreas Moshovos
2009 SIGARCH Computer Architecture News  
Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage.  ...  Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance.  ...  This research was supported in part by an NSERC Discovery Grant, a Canada Foundation for Innovation New Opportunities Infrastructure Grant, and an Intel Research Council grant.  ... 
doi:10.1145/2528521.1508281 fatcat:7tlzjigmfngizesygl2puv273a

Phantom-BTB

Ioana Burcea, Andreas Moshovos
2009 SIGPLAN notices  
Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage.  ...  Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance.  ...  This research was supported in part by an NSERC Discovery Grant, a Canada Foundation for Innovation New Opportunities Infrastructure Grant, and an Intel Research Council grant.  ... 
doi:10.1145/1508284.1508281 fatcat:zdwgalohmbh5tmrzj33joj3fum

Performance of database workloads on shared-memory systems with out-of-order processors

Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, Luiz André Barroso
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within  ...  Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers.  ...  We would also like to thank Jef Kennedy from Oracle for reviewing this manuscript, Marco Annaratone from WRL for supporting this work, and Drew Kramer from WRL for technical support.  ... 
doi:10.1145/291069.291067 dblp:conf/asplos/RanganathanGAB98 fatcat:x5qbk25rdzg45gsfimyiwuxmy4

B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors

David Kadjo, Jinchun Kim, Prabal Sharma, Reena Panda, Paul Gratz, Daniel Jimenez
2014 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture  
Both approaches, become more challenging in modern, Chip-multiprocessor (CMP) design.  ...  These trends also impact hardware budgets for prefetchers. Moreover, in the context of CMPs running multiple concurrent processes, prefetching accuracy is critical to prevent cache pollution effects.  ...  in modern chip-multiprocessor (CMP) design.  ... 
doi:10.1109/micro.2014.29 dblp:conf/micro/KadjoKSPGJ14 fatcat:rbk4bf4dfrfjnaxawb2f2yeqtu

Temporal instruction fetch streaming

Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2008 2008 41st IEEE/ACM International Symposium on Microarchitecture  
Then, we describe a practical mechanism to record these recurring sequences in the L2 cache and leverage them for instruction-cache prefetching.  ...  L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads.  ...  Gold, Nikolaos Hardavellas, Stephen Somogyi, and the anonymous reviewers for their feedback on drafts of this paper. This work was  ... 
doi:10.1109/micro.2008.4771774 dblp:conf/micro/FerdmanWAFM08 fatcat:ffdk7ljp6jbi5hqj2qrtfhrljm
« Previous Showing results 1 — 15 out of 588 results