Filters








206 Hits in 2.6 sec

Call-chain Software Instruction Prefetching in J2EE Server Applications

Priya Nagpurkar, Harold W. Cain, Mauricio Serrano, Jong-Deok Choi, Chandra Krintz
2007 Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on  
To mitigate this performance loss, we describe a new call-chain based algorithm for inserting software prefetch instructions, and evaluate its potential for improved instruction cache performance.  ...  The performance of this algorithm depends on the selection of several independent parameters which control the distance and number of prefetches inserted for a particular method.  ...  Acknowledgments We thank the anonymous reviewers for providing useful comments on this paper. This work was funded in part by IBM Research and NSF grants CCF-0444412 and CNS-0546737.  ... 
doi:10.1109/pact.2007.4336207 fatcat:j2mdpqenlnanhjxdoagmpg7wge

Kilo-instruction processors, runahead and prefetching

Tanausú Ramírez, Alex Pajuelo, Oliverio J. Santana, Mateo Valero
2006 Proceedings of the 3rd conference on Computing frontiers - CF '06  
This mechanism executes speculative instructions under an L2 miss, preventing the processor from being stalled when the reorder buffer completely fills, and thus allowing the generation of useful prefetches  ...  Another technique to alleviate the memory wall problem provides processors with large instruction windows, avoiding window stalls due to in-order commit and long latency loads.  ...  Now, we show the performance when both Runahead execution and the Kilo-instruction processor are enhanced with a stride-based prefetcher.  ... 
doi:10.1145/1128022.1128059 dblp:conf/cf/RamirezPSV06 fatcat:4qur6t4fdra7tntpiiozuth55y

Efficient emulation of hardware prefetchers via event-driven helper threading

Ilya Ganusov, Martin Burtscher
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even  ...  This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores.  ...  and markov prefetchers in commit order.  ... 
doi:10.1145/1152154.1152178 dblp:conf/IEEEpact/GanusovB06 fatcat:xbd5prrckjf3vlymoeipmmdcv4

Enhancing memory level parallelism via recovery-free value prediction

Huiyang Zhou, Thomas M. Conte
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes  ...  In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP.  ...  Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach.  ... 
doi:10.1145/782814.782859 dblp:conf/ics/ZhouC03 fatcat:zf6qsbmnhndphdvchigxhveihq

Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

Huiyang Zhou, T.M. Conte
2005 IEEE transactions on computers  
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes  ...  In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP.  ...  Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach.  ... 
doi:10.1109/tc.2005.117 fatcat:hvxfenoxyvf5tjalqizzx6xiqu

Enhancing memory level parallelism via recovery-free value prediction

Huiyang Zhou, Thomas M. Conte
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes  ...  In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP.  ...  Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach.  ... 
doi:10.1145/782856.782859 fatcat:el6lsep2qva4xplabkimx4wb6y

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Michael Ferdman, Babak Falsafi, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki
2012 ACM Transactions on Computer Systems  
We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor microarchitecture is inefficient for running these workloads.  ...  However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the  ...  We classify each cycle of execution as Committing if at least one instruction was committed during that cycle or as Stalled otherwise.  ... 
doi:10.1145/2382553.2382557 fatcat:huy2nlmwibftnbrk32z77noowq

A performance-correctness explicitly-decoupled architecture

Alok Garg, Michael C. Huang
2008 2008 41st IEEE/ACM International Symposium on Microarchitecture  
predictions and performing accurate prefetching.  ...  We propose to separate performance goals from the correctness goal using an explicitly-decoupled architecture.  ...  A large body of work focuses on enhancing the processor's capability to buffer more in-flight instructions so as to avoid stalling [33] - [38] , or to perform a special "runahead" execution during a  ... 
doi:10.1109/micro.2008.4771800 dblp:conf/micro/GargH08 fatcat:ax2epz6twffxhnfzjidpfje4zy

Overlapping dependent loads with addressless preload

Zhen Yang, Xudong Shi, Feiqi Su, Jih-Kwon Peir
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
Performance evaluations based on SPEC2000 and Olden applications show that significant speedups up to 40% with an average of 16% are achievable using the Preload.  ...  RELATED WORK There have been many software and hardware oriented prefetching proposals for alleviating performance penalties on cache misses [14, 6, 18, 13, 30, 28, 4, 25, 9, 8, 29, 31, 12] .  ...  The identified pointers are used to initiate prefetching of the successor nodes.  ... 
doi:10.1145/1152154.1152196 dblp:conf/IEEEpact/YangSSP06 fatcat:sy5rbme53bhbxns6kdpjslrsq4

Exploiting the Role of Hardware Prefetchers in Multicore Processors

Hasina Khatoon, Shahid Hafeez, Talat Altaf
2013 International Journal of Advanced Computer Science and Applications  
A number of issues have emerged when prefetching is used aggressively in multicore processors.  ...  Another aspect that is investigated is the performance of multicore processors using a multiprogram workload as compared to a single program workload while varying the configuration of the built-in hardware  ...  Manikantan and Govindarajan [24] have proposed performance-oriented prefetching enhancements that include focused prefetching to avoid commit stalls.  ... 
doi:10.14569/ijacsa.2013.040622 fatcat:z2vik33z5rbnjdkuzaxpiptcxu

Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Yuan Chou, Brian Fahs, Santosh Abraham
2004 SIGARCH Computer Architecture News  
In addition, we demonstrate that runahead execution is highly effective in enhancing MLP, potentially improving the MLP of the database workload by 82% and its overall performance by 60%.  ...  Finally, our limit study shows that there is considerable headroom in improving MLP and overall performance by implementing effective instruction prefetching, more accurate branch prediction and better  ...  All three workloads we used are transaction-oriented and do not exhibit phase changes.  ... 
doi:10.1145/1028176.1006708 fatcat:oqnkkj5w3zdcnbmf5jgjz66ti4

Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications

Luis Fabricio Wanderley Goes, Nikolas Ioannou, Polychronis Xekalakis, Murray Cole, Marcelo Cintra
2012 IEEE Transactions on Parallel and Distributed Systems  
Using a novel hierarchical autotuning mechanism, it dynamically selects the most suitable set of optimizations for each application and adjusts them accordingly.  ...  These performance improvements match or even exceed those obtained by a static exhaustive search of the optimization space.  ...  Góes performed while being a PhD student at the University of Edinburgh.  ... 
doi:10.1109/tpds.2012.140 fatcat:2bwqz5hyyngybmljeuhhgktiky

A prefetching technique for object-oriented databases [chapter]

Nils Knafla
1997 Lecture Notes in Computer Science  
The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access.  ...  The page probability is used for the prefetch decision and for the order of the disk queue.  ...  stalls for the prefetched page).  ... 
doi:10.1007/3-540-63263-8_19 fatcat:rwnxln4wrbbglnxq6tpyaatr7u

Effective jump-pointer prefetching for linked data structures

Amir Roth, Gurindar S. Sohi
1999 SIGARCH Computer Architecture News  
Jumppointers, which provide direct access to non-adjacent nodes, can be used for prefetching when loop and recursive procedure bodies are small and do not have sufficient work to overlap a long latency  ...  On a suite of pointer intensive programs, jumppointer prefetching reduces memory stall time by 72% for software, 83% for cooperative and 55% for hardware, producing speedups of 15%, 20% and 22% respectively  ...  Introduction Linked data structures (LDS) are common in many applications, and their importance is growing with the spread of object-oriented programming.  ... 
doi:10.1145/307338.300989 fatcat:tt5txwocevehhdlxd67jkbp3i4

Targeted Data Prefetching [chapter]

Weng-Fai Wong
2005 Lecture Notes in Computer Science  
Our results show that our prefetch strategy can reduce up to 45% of stall cycles of benchmarks running on a simulated out-of-order superscalar processor with an overhead of 0.0005 prefetch per CPU cycle  ...  The success of any data prefetching scheme depends on three factors: timeliness, accuracy and overhead.  ...  The key contribution of this work is in pointing out that even with a simple scheme, prefetching can be targeted very specifically to the load instructions that matter, and this yields significant practical  ... 
doi:10.1007/11572961_63 fatcat:qdjnboounngotjfjjx6yww6kde
« Previous Showing results 1 — 15 out of 206 results