A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Call-chain Software Instruction Prefetching in J2EE Server Applications
2007
Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on
To mitigate this performance loss, we describe a new call-chain based algorithm for inserting software prefetch instructions, and evaluate its potential for improved instruction cache performance. ...
The performance of this algorithm depends on the selection of several independent parameters which control the distance and number of prefetches inserted for a particular method. ...
Acknowledgments We thank the anonymous reviewers for providing useful comments on this paper. This work was funded in part by IBM Research and NSF grants CCF-0444412 and CNS-0546737. ...
doi:10.1109/pact.2007.4336207
fatcat:j2mdpqenlnanhjxdoagmpg7wge
Kilo-instruction processors, runahead and prefetching
2006
Proceedings of the 3rd conference on Computing frontiers - CF '06
This mechanism executes speculative instructions under an L2 miss, preventing the processor from being stalled when the reorder buffer completely fills, and thus allowing the generation of useful prefetches ...
Another technique to alleviate the memory wall problem provides processors with large instruction windows, avoiding window stalls due to in-order commit and long latency loads. ...
Now, we show the performance when both Runahead execution and the Kilo-instruction processor are enhanced with a stride-based prefetcher. ...
doi:10.1145/1128022.1128059
dblp:conf/cf/RamirezPSV06
fatcat:4qur6t4fdra7tntpiiozuth55y
Efficient emulation of hardware prefetchers via event-driven helper threading
2006
Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06
The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even ...
This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores. ...
and markov prefetchers in commit order. ...
doi:10.1145/1152154.1152178
dblp:conf/IEEEpact/GanusovB06
fatcat:xbd5prrckjf3vlymoeipmmdcv4
Enhancing memory level parallelism via recovery-free value prediction
2003
Proceedings of the 17th annual international conference on Supercomputing - ICS '03
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes ...
In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP. ...
Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach. ...
doi:10.1145/782814.782859
dblp:conf/ics/ZhouC03
fatcat:zf6qsbmnhndphdvchigxhveihq
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
2005
IEEE transactions on computers
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes ...
In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP. ...
Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach. ...
doi:10.1109/tc.2005.117
fatcat:hvxfenoxyvf5tjalqizzx6xiqu
Enhancing memory level parallelism via recovery-free value prediction
2003
Proceedings of the 17th annual international conference on Supercomputing - ICS '03
We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes ...
In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP. ...
Section 3 illustrates the performance potential of using value prediction to enhance MLP. Section 4 presents the details of our proposed approach. ...
doi:10.1145/782856.782859
fatcat:el6lsep2qva4xplabkimx4wb6y
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
2012
ACM Transactions on Computer Systems
We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor microarchitecture is inefficient for running these workloads. ...
However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the ...
We classify each cycle of execution as Committing if at least one instruction was committed during that cycle or as Stalled otherwise. ...
doi:10.1145/2382553.2382557
fatcat:huy2nlmwibftnbrk32z77noowq
A performance-correctness explicitly-decoupled architecture
2008
2008 41st IEEE/ACM International Symposium on Microarchitecture
predictions and performing accurate prefetching. ...
We propose to separate performance goals from the correctness goal using an explicitly-decoupled architecture. ...
A large body of work focuses on enhancing the processor's capability to buffer more in-flight instructions so as to avoid stalling [33] - [38] , or to perform a special "runahead" execution during a ...
doi:10.1109/micro.2008.4771800
dblp:conf/micro/GargH08
fatcat:ax2epz6twffxhnfzjidpfje4zy
Overlapping dependent loads with addressless preload
2006
Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06
Performance evaluations based on SPEC2000 and Olden applications show that significant speedups up to 40% with an average of 16% are achievable using the Preload. ...
RELATED WORK There have been many software and hardware oriented prefetching proposals for alleviating performance penalties on cache misses [14, 6, 18, 13, 30, 28, 4, 25, 9, 8, 29, 31, 12] . ...
The identified pointers are used to initiate prefetching of the successor nodes. ...
doi:10.1145/1152154.1152196
dblp:conf/IEEEpact/YangSSP06
fatcat:sy5rbme53bhbxns6kdpjslrsq4
Exploiting the Role of Hardware Prefetchers in Multicore Processors
2013
International Journal of Advanced Computer Science and Applications
A number of issues have emerged when prefetching is used aggressively in multicore processors. ...
Another aspect that is investigated is the performance of multicore processors using a multiprogram workload as compared to a single program workload while varying the configuration of the built-in hardware ...
Manikantan and Govindarajan [24] have proposed performance-oriented prefetching enhancements that include focused prefetching to avoid commit stalls. ...
doi:10.14569/ijacsa.2013.040622
fatcat:z2vik33z5rbnjdkuzaxpiptcxu
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
2004
SIGARCH Computer Architecture News
In addition, we demonstrate that runahead execution is highly effective in enhancing MLP, potentially improving the MLP of the database workload by 82% and its overall performance by 60%. ...
Finally, our limit study shows that there is considerable headroom in improving MLP and overall performance by implementing effective instruction prefetching, more accurate branch prediction and better ...
All three workloads we used are transaction-oriented and do not exhibit phase changes. ...
doi:10.1145/1028176.1006708
fatcat:oqnkkj5w3zdcnbmf5jgjz66ti4
Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications
2012
IEEE Transactions on Parallel and Distributed Systems
Using a novel hierarchical autotuning mechanism, it dynamically selects the most suitable set of optimizations for each application and adjusts them accordingly. ...
These performance improvements match or even exceed those obtained by a static exhaustive search of the optimization space. ...
Góes performed while being a PhD student at the University of Edinburgh. ...
doi:10.1109/tpds.2012.140
fatcat:2bwqz5hyyngybmljeuhhgktiky
A prefetching technique for object-oriented databases
[chapter]
1997
Lecture Notes in Computer Science
The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. ...
The page probability is used for the prefetch decision and for the order of the disk queue. ...
stalls for the prefetched page). ...
doi:10.1007/3-540-63263-8_19
fatcat:rwnxln4wrbbglnxq6tpyaatr7u
Effective jump-pointer prefetching for linked data structures
1999
SIGARCH Computer Architecture News
Jumppointers, which provide direct access to non-adjacent nodes, can be used for prefetching when loop and recursive procedure bodies are small and do not have sufficient work to overlap a long latency ...
On a suite of pointer intensive programs, jumppointer prefetching reduces memory stall time by 72% for software, 83% for cooperative and 55% for hardware, producing speedups of 15%, 20% and 22% respectively ...
Introduction Linked data structures (LDS) are common in many applications, and their importance is growing with the spread of object-oriented programming. ...
doi:10.1145/307338.300989
fatcat:tt5txwocevehhdlxd67jkbp3i4
Targeted Data Prefetching
[chapter]
2005
Lecture Notes in Computer Science
Our results show that our prefetch strategy can reduce up to 45% of stall cycles of benchmarks running on a simulated out-of-order superscalar processor with an overhead of 0.0005 prefetch per CPU cycle ...
The success of any data prefetching scheme depends on three factors: timeliness, accuracy and overhead. ...
The key contribution of this work is in pointing out that even with a simple scheme, prefetching can be targeted very specifically to the load instructions that matter, and this yields significant practical ...
doi:10.1007/11572961_63
fatcat:qdjnboounngotjfjjx6yww6kde
« Previous
Showing results 1 — 15 out of 206 results