Filters








2,811 Hits in 6.4 sec

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. Falsafi, D.A. Wood
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
In this paper, we study PDQ's impact on software protocol performance in the context of fine-grain distributed shared memory (DSM) on an SMP cluster.  ...  Parallel systems often use fine-grain software handlers to integrate a network message into computation. Executing such handlers in parallel requires access synchronization around resources.  ...  In fine-grain DSM, for instance, the majority of executed protocol handlers implement coherence on fine-grain shared-memory blocks.  ... 
doi:10.1109/hpca.1999.744362 dblp:conf/hpca/FalsafiW99 fatcat:hrkgiidfmzc43m4jz2qttnrdpm

Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Yao Guo, Vladimir Vlassov, Raksit Ashok, Richard Weiss, Csaba Andras Moritz
2008 Journal of Parallel and Distributed Computing  
Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols.  ...  The performance improvement increases up to 38% when simulating with an ideal L2 cache system.  ...  We also want to thank Zhenghua Qi for his involvement in the development of the simulator and all the anonymous reviewers for their useful comments and suggestions.  ... 
doi:10.1016/j.jpdc.2007.08.003 fatcat:vemedys7y5ekxnk4uu6cvdyqsi

The MIT Alewife machine

Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, Donald Yeung
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
Finally, language constructs that allow programmers to express fine-grain synchronization can improve performance by over a factor of two.  ...  Microbenchmarks, together with over a dozen complete applications running on the 32-node prototype, help to analyze the behavior of the system.  ...  The following members of the Alewife team contributed significantly to the success of the project: Jonathan Babb, Rajeev Barua, Fred Chong, David Hoki, Ed Hurley, Gino Maa, Anne McCarthy, Sramana Mitra  ... 
doi:10.1145/223982.223985 dblp:conf/isca/AgarwalBCJKKLMY95 fatcat:6bhv57cqzvdw5pvjswfrf2k6mu

The MIT Alewife Machine

A. Agarwal, R. Bianchini, D. Chaiken, F.T. Chong, K.L. Johnson, D. Kranz, J.D. Kubiatowicz, Beng-Hong Lim, K. Mackenzie, D. Yeung
1999 Proceedings of the IEEE  
Alewife, an early prototype of such DSM architectures, uses a hybrid of software and hardware mechanisms to support coherent shared memory, efficient user-level messaging, fine-grain synchronization, and  ...  compiler and operating system designers to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency-tolerance  ...  ACKNOWLEDGMENT The following members of the Alewife team contributed significantly to the success of the project: J. Babb, R. Barua, D. Hoki, E. Hurley, G. Maa, A. McCarthy, S. Mitra, D.  ... 
doi:10.1109/5.747864 fatcat:6ebg346wnzcqxa22ayhdrmpdni

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems [article]

Johnathan Alsop, Weon Taek Na, Matthew D. Sinclair, Samuel Grayson, Sarita V. Adve
2021 arXiv   pre-print
This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems.  ...  More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous  ...  ACKNOWLEDGEMENTS This work is supported in part by the National Science Foundation under grant CCF  ... 
arXiv:2104.11678v1 fatcat:urn2a4zn75d3jevj2wmpf4dzpi

Characterization of TCC on chip-multiprocessors

A. McDonald, JaeWoong Chung, H. Chafi, Chi Cao Minh, B.D. Carlstrom, L. Hammond, C. Kozyrakis, K. Olukotun
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization  ...  Furthermore, we characterize the performance of TCC in comparison to conventional snoopy cache coherence (SCC) using parallel applications optimized for each scheme.  ...  An update protocol is possible for this CMP system as the committing processor is already placing the data on the commit bus for the L2 cache.  ... 
doi:10.1109/pact.2005.11 dblp:conf/IEEEpact/McDonaldCCMCHKO05 fatcat:roc4b7suzvctrebk4chzdwjpti

Update-based cache coherence protocols for scalable shared-memory multiprocessors

D.B. Glasco, B.A. Delagi, M.J. Flynn
1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences HICSS-94  
The paper discusses the two major disadvantages of the update protocols: inefficiency of updates and the mismatch between the granularity of synchronization and the data transfer.  ...  In this paper, two hardware-controlled update-based cache coherence protocols are presented.  ...  take full advantage of the fine-grain data updates.  ... 
doi:10.1109/hicss.1994.323135 fatcat:eacn4ubhefbu3ncko4dputmmv4

Architectural Support and Mechanisms for Object Caching in Dynamic Multithreaded Computations

Vijay Karamcheti, Andrew A. Chien
1999 Journal of Parallel and Distributed Computing  
A detailed performance analysis of four irregular applications, using the Illinois Concert System on the Cray T3D and the SGI Origin 2000, finds that existing software distributed shared memory (DSM) systems  ...  Recognizing that this situation stems from the synchronous request-reply nature of DSM protocols, we present a composable object caching framework, called view caching, which exploits knowledge of application  ...  Acknowledgements The authors would like to acknowledge John Plevyak, Julian Dolby, Xingbin Zhang, Scott Pakin, and other members of the Concurrent Systems Architecture Group for their work on various parts  ... 
doi:10.1006/jpdc.1999.1555 fatcat:hz4xyy3kdfc2jc3lg5tyfmytbq

Exploiting Spatial Store Locality Through Permission Caching in Software DSMs [chapter]

Håkan Zeffer, Zoran Radović, Oskar Grenholm, Erik Hagersten
2004 Lecture Notes in Computer Science  
This paper (1) shows that most of the instrumentation overhead in the fine-grained SW-DSM system DSZOOM is store-related, (2) introduces a new write permission cache (WPC) technique that exploits spatial  ...  Fine-grained software-based distributed shared memory (SW-DSM) systems typically maintain coherence with in-line checking code at load and store operations to shared memory.  ...  We demonstrate that the instrumentation overhead of the fine-grained software DSM system, DSZOOM [4] , can be reduced with both 1-and 2-entry WPC implementations.  ... 
doi:10.1007/978-3-540-27866-5_72 fatcat:46dfdq2gezgktkhg2uzauu7dpe

Removing the overhead from software-based shared memory

Zoran Radović, Erik Hagersten
2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '01  
This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall.  ...  The technique is applicable to both page-based and fine-grain software-based shared memory.  ...  for comments on earlier drafts of the paper.  ... 
doi:10.1145/582034.582090 dblp:conf/sc/RadovicH01 fatcat:tggugyxjt5f47kojsm7tuijc4q

The Wisconsin Wind Tunnel project

Mark D. Hill, James R. Larus, David A. Wood
1994 SIGARCH Computer Architecture News  
on-line.  ...  This document lists contributors to the Wisconsin Wind Tunnel Project, gives a brief description of the project, and presents references and abstracts to its principal papers, including how to obtain them  ...  Fine-grain messaging provides the short, low-latency messages required to implement cache coherence protocols and support fine-grain parallelism.  ... 
doi:10.1145/192537.192543 fatcat:rvtgkgeonnba3cdbociaiglrdq

High Performance Software Coherence for Current and Future Architectures

L.I. Kontothanassis, M.L. Scott
1995 Journal of Parallel and Distributed Computing  
do not use stale data in their computation.  ...  To support this claim we present a software coherence protocol that runs on this class of machines, and use simulation to conduct a performance study.  ...  Our thanks also to the anonymous referees for their detailed and insightful comments.  ... 
doi:10.1006/jpdc.1995.1116 fatcat:vdkmgnxz2zbarikf7mmcsi7mge

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence

Mahdad Davari, Alberto Ros, Erik Hagersten, Stefanos Kaxiras
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
(ii) Reducing the amount of shared data has no perceptible impact on coherence misses caused by self-invalidation of shared data, hence no impact on performance.  ...  In this paper we ask the question: how granularity-page-level vs. cache-line level-and adaptivity-going from shared to private-affect the outcome of classification and what is its final impact on coherence  ...  This illustrates the significant impact of fine-grained adaptive data classification on the amount of data classified as private.  ... 
doi:10.1145/2790301 fatcat:wk2bsaxxaje57b7sq2q4yqcxyq

Tempest and typhoon

Steven K. Reinhardt, James R. Larus, David A. Wood
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
First, the Stache protocol uses Tempest's finegrain access control mechanisms to manage part of a processor's local memory as a large, fully-associative cache for remote data.  ...  We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware Dir N NB cache-coherence protocol for five shared-memory programs  ...  Acknowledgments This work is part of the Wisconsin Wind Tunnel project, which is co-led by Mark Hill, James Larus, and David Wood and funded by the National Science Foundation.  ... 
doi:10.1145/285930.286008 dblp:conf/isca/ReinhardtLW98a fatcat:2ggsngclordifld3ksmlfcri3m

Tempest and typhoon

S. K. Reinhardt, J. R. Larus, D. A. Wood
1994 SIGARCH Computer Architecture News  
First, the Stache protocol uses Tempest's finegrain access control mechanisms to manage part of a processor's local memory as a large, fully-associative cache for remote data.  ...  We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware Dir N NB cache-coherence protocol for five shared-memory programs  ...  Acknowledgments This work is part of the Wisconsin Wind Tunnel project, which is co-led by Mark Hill, James Larus, and David Wood and funded by the National Science Foundation.  ... 
doi:10.1145/192007.192062 fatcat:6yffl7imkzhaparrgqihwlzbfu
« Previous Showing results 1 — 15 out of 2,811 results