Filters








195 Hits in 5.0 sec

Speculative Prefetching of Induction Pointers [chapter]

Artour Stoutchinin, José Nelson Amaral, Guang R. Gao, James C. Dehnert, Suneel Jain, Alban Douillet
2001 Lecture Notes in Computer Science  
We integrate induction pointer prefetching with loop scheduling.  ...  This identification uses a surprisingly simple method that looks for induction pointers -pointers that are updated in each loop iteration by a load with a constant offset.  ...  When Pointer Prefetching Does Not Help On the other hand, speculative induction pointer prefetching is not as effective in a number of programs.  ... 
doi:10.1007/3-540-45306-7_20 fatcat:afgmfhlxkfelxao6ep32lmw2ha

Effective jump-pointer prefetching for linked data structures

Amir Roth, Gurindar S. Sohi
1999 SIGARCH Computer Architecture News  
On a suite of pointer intensive programs, jumppointer prefetching reduces memory stall time by 72% for software, 83% for cooperative and 55% for hardware, producing speedups of 15%, 20% and 22% respectively  ...  This paper describes a framework for jump-pointer prefetching (JPP) that supports four prefetching idioms: queue, full, chain, and root jumping and three implementations: software-only, hardware-only,  ...  The views and conclusions presented are those of the authors and do not necessarily represent the official policies or endorsements, either expressed or implied, of the U.S.  ... 
doi:10.1145/307338.300989 fatcat:tt5txwocevehhdlxd67jkbp3i4

A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, Donald Yeung
2004 ACM Transactions on Computer Systems  
We also propose using speculation to identify independent pointer chains in dynamic traversals.  ...  While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism.  ...  For affine loops that traverse non-recursive ribs (i.e. the induction variable update is of the form i+=constant, but the loop body dereferences pointers derived from the induction variable), all pointer  ... 
doi:10.1145/986533.986536 fatcat:dxzpgbsbxjazbddavyftouka2u

Design and evaluation of compiler algorithms for pre-execution

Dongkeun Kim, Donald Yeung
2002 ACM SIGOPS Operating Systems Review  
Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls.  ...  Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate threadlevel parallelism when necessary for latency tolerance  ...  We also thank Chau-Wen Tseng and the anonymous reviewers for their constructive comments on previous drafts of this paper.  ... 
doi:10.1145/635508.605415 fatcat:xcvkbkhblvgotc3uwrlfpsynym

Design and evaluation of compiler algorithms for pre-execution

Dongkeun Kim, Donald Yeung
2002 Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02  
Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls.  ...  Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate threadlevel parallelism when necessary for latency tolerance  ...  We also thank Chau-Wen Tseng and the anonymous reviewers for their constructive comments on previous drafts of this paper.  ... 
doi:10.1145/605397.605415 dblp:conf/asplos/KimY02 fatcat:6vr72xdgc5el7o6t2ktsydmuyq

Design and evaluation of compiler algorithms for pre-execution

Dongkeun Kim, Donald Yeung
2002 Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02  
Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls.  ...  Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate threadlevel parallelism when necessary for latency tolerance  ...  We also thank Chau-Wen Tseng and the anonymous reviewers for their constructive comments on previous drafts of this paper.  ... 
doi:10.1145/605414.605415 fatcat:ch2pdqj4urfp7feqoxgwqnpraq

Design and evaluation of compiler algorithms for pre-execution

Dongkeun Kim, Donald Yeung
2002 SIGPLAN notices  
Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls.  ...  Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate threadlevel parallelism when necessary for latency tolerance  ...  We also thank Chau-Wen Tseng and the anonymous reviewers for their constructive comments on previous drafts of this paper.  ... 
doi:10.1145/605432.605415 fatcat:gkfhpfkbgrd2np3lowuen7y4ga

Design and evaluation of compiler algorithms for pre-execution

Dongkeun Kim, Donald Yeung
2002 SIGARCH Computer Architecture News  
Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls.  ...  Finally, threading scheme selection chooses the best scheme for initiating pre-execution threads, speculatively parallelizing loops to generate threadlevel parallelism when necessary for latency tolerance  ...  We also thank Chau-Wen Tseng and the anonymous reviewers for their constructive comments on previous drafts of this paper.  ... 
doi:10.1145/635506.605415 fatcat:gslkhwv4pzbyrbx4kpld2cighy

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
Dependence-based prefetching achieves speedups of up to 2.5% on a suite of pointer-intensive programs.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  The views and conclusions presented are those of the authors and do not necessarily represent the official policies or endorsements, either expressed or implied, of the U.S.  ... 
doi:10.1145/291069.291034 dblp:conf/asplos/RothMS98 fatcat:jul62swkkjbr5f24jf5qjciwoa

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 ACM SIGOPS Operating Systems Review  
Dependence-based prefetching achieves speedups of up to 2.5% on a suite of pointer-intensive programs.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  The views and conclusions presented are those of the authors and do not necessarily represent the official policies or endorsements, either expressed or implied, of the U.S.  ... 
doi:10.1145/384265.291034 fatcat:4lkgda5hpbck3hbt6npl65jsoi

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 SIGPLAN notices  
Dependence-based prefetching achieves speedups of up to 2.5% on a suite of pointer-intensive programs.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  The views and conclusions presented are those of the authors and do not necessarily represent the official policies or endorsements, either expressed or implied, of the U.S.  ... 
doi:10.1145/291006.291034 fatcat:fblqhoxlrvafpjqgjz6gcq3csi

Accelerating and Adapting Precomputation Threads for Effcient Prefetching

Weifeng Zhang, Dean M. Tullsen, Brad Calder
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Speculative precomputation enables effective cache prefetching for even irregular memory access behavior, by using an alternate thread on a multithreaded or multi-core architecture.  ...  Both construction and execution of the prefetching threads happen in another thread, imposing little overhead on the main thread.  ...  This is an aggressive dynamic inline prefetching system that takes full advantage of the Trident framework, including dynamic detection of delinquent loads, stride prediction of pointer loads, and dynamic  ... 
doi:10.1109/hpca.2007.346187 dblp:conf/hpca/ZhangTC07 fatcat:dufm3kem2nb5todku6huxpbxcm

A Hybrid Hardware/Software Generated Prefetching Thread Mechanism on Chip Multiprocessors [chapter]

Hou Rui, Longbing Zhang, Weiwu Hu
2006 Lecture Notes in Computer Science  
This paper proposes a hybrid hardware/software generated prefetching thread mechanism on Chip Multiprocessors(CMP). Two kinds of prefetching threads appear in our hybrid mechanism.  ...  For a set of memory limited benchmarks with complicated access patterns, an average speedup of 3.1% is achieved on dual-core CMP when constructing basic hardware-generated prefetching thread, and this  ...  When the original thread accesses one pointer chain, Static Prefetching Threads simultaneously perform their speculative traversal of other possible future chains on idle cores.  ... 
doi:10.1007/11823285_52 fatcat:5cg6rfs2k5hfpamvwt4sum6o3y

Value-Profile Guided Stride Prefetching for Irregular Code [chapter]

Youfeng Wu, Mauricio Serrano, Rakesh Krishnaiyer, Wei Li, Jesse Fang
2002 Lecture Notes in Computer Science  
Memory operations in irregular code are difficult to prefetch, as the future address of a memory location is hard to anticipate by a compiler.  ...  This paper presents a novel compiler technique to profile and prefetch for those loads.  ...  We appreciate the comments from the anonymous reviewers that helped improve the quality of the paper.  ... 
doi:10.1007/3-540-45937-5_22 fatcat:i3pxfqwva5gnnbb2tbbln4jmcq

Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Hou Rui, Longbing Zhang, Weiwu Hu
2007 Microprocessors and microsystems  
For a set of memory limited benchmarks selected from Olden benchmark, SPEC CPU2000 as well as Stream benchmark, an average speedup of 3.8% is achieved on dual-core CMP when constructing basic Dynamic Prefetching  ...  "Self-Loop" policy can greatly enlarge the prefetching range and issue more timely prefetches.  ...  This work is supported by National Basic Research Program of China (2005CB321600), and National Natural Science Foundation of China (NSFC) Grant No. 60325205.  ... 
doi:10.1016/j.micpro.2006.09.002 fatcat:t6ns4nfdojfkvdxsiao3qt677e
« Previous Showing results 1 — 15 out of 195 results