Filters








100 Hits in 7.6 sec

Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes

N. Kohout, Seungryul Choi, Dongkeun Kim, D. Yeung
Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques  
memory parallelism for the purpose of memory latency tolerance.  ...  While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism.  ...  The authors would like to thank Chau-Wen Tseng for helpful discussions on the LDS descriptor framework, and Bruce Jacob for helpful comments on previous drafts of this paper.  ... 
doi:10.1109/pact.2001.953307 dblp:conf/IEEEpact/KohoutCKY01 fatcat:6vros3zhxvfdznfpvol4mnaqnq

A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, Donald Yeung
2004 ACM Transactions on Computer Systems  
This article investigates exploiting such inter-chain memory parallelism for the purpose of memory latency tolerance, using a technique called multi-chain prefetching.  ...  While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism.  ...  This article investigates exploiting the natural memory parallelism that exists between independent serialized pointer-chasing traversals, or inter-chain memory parallelism.  ... 
doi:10.1145/986533.986536 fatcat:dxzpgbsbxjazbddavyftouka2u

Semantics-Aware, Timely Prefetching of Linked Data Structure

Gang Liu, Zhuo Huang, Jih-kwon Peir, Xudong Shi
2010 2010 IEEE 16th International Conference on Parallel and Distributed Systems  
Due to tight load-load dependences in LDS traversal, the chance of overlapping the cache misses in exploiting the memory-level parallelism is slim.  ...  Furthermore, the irregularity of missing block addresses makes it difficult for accurate data prefetching without recording a huge miss history.  ...  The multi-chain prefetcher [16] explores inter-LDS parallelism based on compiler analysis.  ... 
doi:10.1109/icpads.2010.70 dblp:conf/icpads/LiuHPS10 fatcat:zpmg6ixj2vguzn76c5jleouec4

Prefetched Address Translation

Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, Boris Grot
2019 Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '52  
Such huge datasets pressure the TLB, resulting in frequent misses that must be resolved through a page walk -a long-latency pointer chase through multiple levels of the in-memory radix tree-based page  ...  We introduce Address Translation with Prefetching (ASAP), a new approach for reducing the latency of address translation to a single access to the memory hierarchy.  ...  This work was supported by the Google Faculty Research Award, the EPSRC Centre for Doctoral Training in Pervasive Parallelism at the University of Edinburgh, and the industrial CASE studentship from Arm  ... 
doi:10.1145/3352460.3358294 dblp:conf/micro/MargaritovUBG19 fatcat:pvdivtulezb2pkebma2zpd6dfu

Improving hash join performance through prefetching

Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, Todd C. Mowry
2007 ACM Transactions on Database Systems  
Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing.  ...  GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance.  ...  DeWitt for insightful comments. The third author thanks P. Bohannon, S. Ganguly, H. F. Korth, and P. P. S. Narayan for helpful discussions.  ... 
doi:10.1145/1272743.1272747 fatcat:v6sfoxzlhfhktpwzdafompzh4i

Helper thread prefetching for loosely-coupled multiprocessor systems

Changhee Jung, Daeseob Lim, Jaejin Lee, Y. Solihin
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
To demonstrate that prefetching in a loosely-coupled system can be done effectively, we evaluate our prefetching in a standard, unmodified CMP system, and in an intelligent memory system where a simple  ...  Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads.  ...  If an item appears to be an address, the engine prefetches it, allowing automatic pointer chasing. Alexander and Kedem [1] propose a hardware controller that monitors re-quests at the main memory.  ... 
doi:10.1109/ipdps.2006.1639375 dblp:conf/ipps/JungLLS06 fatcat:yu6tyngi6zdtllclk2slh7fnmy

Efficient emulation of hardware prefetchers via event-driven helper threading

Ilya Ganusov, Martin Burtscher
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even  ...  Furthermore, we demonstrate that running event-driven prefetching threads on top of a baseline with a hardware stride prefetcher yields significant speedups for many programs.  ...  The rest of this section explains the hardware support for EDHT and its operation. Figure 3a shows a typical pointer chasing loop.  ... 
doi:10.1145/1152154.1152178 dblp:conf/IEEEpact/GanusovB06 fatcat:xbd5prrckjf3vlymoeipmmdcv4

Asynchronous memory access chaining

Onur Kocberber, Babak Falsafi, Boris Grot
2015 Proceedings of the VLDB Endowment  
Hiding the memory latency by launching additional memory accesses for other lookups is an effective way of improving performance of pointer-chasing codes (e.g., hash table probes, tree traversals).  ...  This work introduces Asynchronous Memory Access Chaining (AMAC), a new approach for exploiting interlookup parallelism to hide the memory access latency.  ...  %$# (c) AMAC Hiding Memory Access Latency Software Prefetching Techniques for Pointer-Chasing Database Operations The state-of-the-art pointer-chasing prefetching techniques -namely, Group Prefetching  ... 
doi:10.14778/2856318.2856321 fatcat:z2inbytdlbfzjibftcc4qrm5fu

Fast Key-Value Lookups with Node Tracker

Mustafa Cavus, Mohammed Shatnawi, Resit Sendag, Augustus K. Uht
2021 ACM Transactions on Architecture and Code Optimization (TACO)  
Lookup operations for in-memory databases are heavily memory bound, because they often rely on pointer-chasing linked data structure traversals.  ...  We propose the Node Tracker (NT), a novel programmable prefetcher/pre-execution unit that is highly effective in exploiting inter key-lookup parallelism to improve single-thread performance.  ...  Executing multiple lookups is an effective way of hiding memory latencies by exploiting memory-level parallelism (MLP). One way to achieve inter-lookup parallelism is multithreading.  ... 
doi:10.1145/3452099 fatcat:facltmiss5anfcrmfmabj4jh3u

Accelerating database operators using a network processor

Brian Gold, Anastassia Ailamaki, Larry Huston, Babak Falsafi
2005 Proceedings of the 1st international workshop on Data management on new hardware - DAMON '05  
Rather than expend chip area and power on out-of-order execution, as in current SMT processors, we demonstrate the effectiveness of using many simple processor cores, each with hardware support for multiple  ...  This paper shows an existing hardware architecture-the network processor-already fits the model for multi-threaded, multi-core execution.  ...  ACKNOWLEDGEMENTS The authors would like to thank Jared Smolens for help with early versions of this work, Minglong Shao for help with Shore, and members of the Carnegie Mellon Impetus group  ... 
doi:10.1145/1114252.1114260 fatcat:dm7ub4pxhrhkrpdcrsb6jzo6ea

vectorizing algorithm [chapter]

2014 Dictionary Geotechnical Engineering/Wörterbuch GeoTechnik  
IMV can make full use of the data parallelism in SIMD and the memory level parallelism through prefetching.  ...  It interleaves multiple execution instances of vectorized code to hide memory access latency with more computation.  ...  AMAC is the state-of-the-art technique to exploit the software prefetching for immediate memory accesses, especially in irregular pointer-chasing applications.  ... 
doi:10.1007/978-3-642-41714-6_220210 fatcat:rjbjvj5xzzg2zn4u662f5liuje

Speculative precomputation

Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, John P. Shen
2001 Proceedings of the 28th annual international symposium on Computer architecture - ISCA '01  
It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data.  ...  Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.  ...  Additionally, we would like to thank the many referees of the previous versions of this paper for their extremely useful suggestions.  ... 
doi:10.1145/379240.379248 dblp:conf/isca/CollinsWTHLLS01 fatcat:6ickmrby6nbkbpauxi4lrqpqqy

Speculative precomputation

Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, John P. Shen
2001 SIGARCH Computer Architecture News  
It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data.  ...  Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.  ...  Additionally, we would like to thank the many referees of the previous versions of this paper for their extremely useful suggestions.  ... 
doi:10.1145/384285.379248 fatcat:o3vjjy5775azpa6l2gqfzrqpaa

Spatio-temporal memory streaming

Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
Temporal memory streaming replays previously observed miss sequences to eliminate long chains of dependent misses.  ...  Because each technique targets a different subset of misses, their effectiveness varies across workloads and each leaves a significant fraction of misses unpredicted.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their feedback on this paper and members of the SimFlex team at Carnegie Mellon for contributions to our simulation infrastructure  ... 
doi:10.1145/1555754.1555766 dblp:conf/isca/SomogyiWAF09 fatcat:tcclgwppjnf2fga3lxnoumji2a

Spatio-temporal memory streaming

Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi
2009 SIGARCH Computer Architecture News  
Temporal memory streaming replays previously observed miss sequences to eliminate long chains of dependent misses.  ...  Because each technique targets a different subset of misses, their effectiveness varies across workloads and each leaves a significant fraction of misses unpredicted.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their feedback on this paper and members of the SimFlex team at Carnegie Mellon for contributions to our simulation infrastructure  ... 
doi:10.1145/1555815.1555766 fatcat:2a4fvni5yvgoraeiuhonszveju
« Previous Showing results 1 — 15 out of 100 results