Filters








5,683 Hits in 5.6 sec

Atomic SC for simple in-order processors

Dibakar Gope, Mikko H. Lipasti
2014 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)  
Sequential consistency is arguably the most intuitive memory consistency model for shared-memory multithreaded programming, yet it appears to be a poor fit for simple, in-order processors that are most  ...  On an in-order processor running multithreaded PARSEC workloads, Atomic SC delivers performance that is equal to or better than prior SCcompatible schemes, which require much greater energy and design  ...  This work was supported in part by NSF grants CCF-1116450 and CCF-1318298 and donations from Qualcomm and Oracle research.  ... 
doi:10.1109/hpca.2014.6835950 dblp:conf/hpca/GopeL14 fatcat:amcud74bsfel3fyvlea32kvl3a

The Bulk Multicore architecture for improved programmability

Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos, Wonsun Ahn, Milos Prvulovic
2009 Communications of the ACM  
Since chunks execute atomically and in isolation, commit in program order in each processor, and there is a global commit order of chunks, the Bulk Multicore supports sequential consistency (SC) 9 at the  ...  In a conventional processor that issues memory accesses out of order, supporting SC requires intrusive processor modifications.  ... 
doi:10.1145/1610252.1610271 fatcat:o45c7hfevbgxnnfepaayq7sks4

Programming for different memory consistency models

Kourosh Gharachorloo, Sarita V. Adve, Anoop Gupta, John L. Hennessy, Mark D. Hill
1992 Journal of Parallel and Distributed Computing  
While SC provides a simple model for the programmer, it imposes rigid constraints on the ordering of memory accesses and restricts the use of common hardware and compiler optimizations.  ...  These include processor consistency (PC), weak ordering (WO), release consistency (RCsc/RCpc), total store ordering (TSO), and partial store ordering (PSO).  ...  Acknowledgements We thank Phillip Gibbons, Michael Merritt, and the anonymous referees for their comments.  ... 
doi:10.1016/0743-7315(92)90052-o fatcat:uw3f5hji5rcdjn4rfyb6gmxxxa

Constructing a Weak Memory Model

Sizhuo Zhang, Muralidaran Vijayaraghavan, Andrew Wright, Mehdi Alipour, Arvind
2018 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)  
We will disallow some optimizations which break a programmer's intuition in highly unexpected ways.  ...  The constructed model, which we call General Atomic Memory Model (GAM), allows all four load/store reorderings.  ...  ACKNOWLEDGMENT We thank all the anonymous reviewers and especially our shepherd Thomas Wenisch for their helpful feedbacks on improving this paper.  ... 
doi:10.1109/isca.2018.00021 dblp:conf/isca/ZhangVWAA18 fatcat:5n5xvjz5dfhyhkt2u4evd3jssm

BulkSC

Luis Ceze, James Tuck, Pablo Montesinos, Josep Torrellas
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
In this paper, we propose Bulk Enforcement of SC (BulkSC), a novel way of providing SC that is simple to implement and offers performance comparable to Release Consistency (RC).  ...  BulkSC keeps the implementation simple by largely decoupling memory consistency enforcement from processor structures.  ...  We thank Karin Strauss for the initial suggestion of using bulk operations for consistency enforcement.  ... 
doi:10.1145/1250662.1250697 dblp:conf/isca/CezeTMT07 fatcat:ibsvmgklmjhlpphzg3viwfbwpm

BulkSC

Luis Ceze, James Tuck, Pablo Montesinos, Josep Torrellas
2007 SIGARCH Computer Architecture News  
In this paper, we propose Bulk Enforcement of SC (BulkSC), a novel way of providing SC that is simple to implement and offers performance comparable to Release Consistency (RC).  ...  BulkSC keeps the implementation simple by largely decoupling memory consistency enforcement from processor structures.  ...  We thank Karin Strauss for the initial suggestion of using bulk operations for consistency enforcement.  ... 
doi:10.1145/1273440.1250697 fatcat:tmsvc4yrjvhjvnnlhsmvj6pvf4

Modular Deductive Verification of Multiprocessor Hardware Designs [chapter]

Muralidaran Vijayaraghavan, Adam Chlipala, Arvind, Nirav Dave
2015 Lecture Notes in Computer Science  
We present a new framework for modular verification of hardware designs in the style of the Bluespec language.  ...  in the Coq proof assistant.  ...  This LTS encodes Lamport's notion of SC, where processors take turns executing nondeterministically in a simple interleaving.  ... 
doi:10.1007/978-3-319-21668-3_7 fatcat:ihhvsfhwgbhe7chf46hlw2jlpq

BulkCompiler

W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J.-W. Lee, X. Fang, S. Midkiff, David Wong
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well.  ...  This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance  ...  Since chunks execute atomically and in isolation, commit in program order in each processor, and the arbiter globally orders their commit, BulkSC supports SC at the chunk level -and, as a consequence,  ... 
doi:10.1145/1669112.1669131 dblp:conf/micro/AhnQNTLFMW09 fatcat:6bys2qqmzba75acw2ac27g44ke

Fast synchronization on shared-memory multiprocessors: An architectural approach

Zhen Fang, Lixin Zhang, John B. Carter, Liqun Cheng, Michael Parker
2005 Journal of Parallel and Distributed Computing  
Second, we present an architectural innovation called active memory that enables very fast atomic operations in a shared-memory multiprocessor.  ...  Synchronization is a crucial operation in many parallel applications.  ...  Acknowledgments The authors would like to thank Silicon Graphics Inc. for the technical documentations provided for the simulation, and in particular, the valuable input from Marty Deneroff, Steve Miller  ... 
doi:10.1016/j.jpdc.2005.04.013 fatcat:3dj627j3r5ekzamur5epdzgdo4

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou
2001 International journal of parallel programming  
From the operating system's perspective, the paper evaluates in a unified framework, user-level, kernel-level and hybrid algorithms for implementing scalable synchronization in multiprogrammed execution  ...  From the architectural perspective, the paper identifies the implications of directory-based cache coherence on the latency and scalability of synchronization instructions and examines if and how can simple  ...  The results show that the simple counter barrier is not scalable when the atomic increment of the counter is implemented with LL-SC, but is the fastest when the atomic increment is implemented at-memory  ... 
doi:10.1023/a:1011168003859 dblp:journals/ijpp/NikolopoulosP01 fatcat:kggvvrj4c5cphh4ft2b4pkazpu

Weak Memory Models: Balancing Definitional Simplicity and Implementation Flexibility

Sizhuo Zhang, Muralidaran Vijayaraghavan, Arvind
2017 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)  
WMM is simple (it is similar to the Alpha memory model), but it disallows behaviors arising due to shared store buffers and shared write-through caches (which are seen in POWER processors).  ...  We give the operational definitions of both models using Instantaneous Instruction Execution (I2E), which has been used in the definitions of SC and TSO.  ...  The operations of the SC abstract machine are the simplest: in one step we can select any processor to execute the next instruction on that processor atomically.  ... 
doi:10.1109/pact.2017.29 dblp:conf/IEEEpact/ZhangVA17 fatcat:bzcjxwobkzfzjcnzvg6qyhyb7i

A Primer on Memory Consistency and Cache Coherence

Daniel J. Sorin, Mark D. Hill, David A. Wood
2011 Synthesis Lectures on Computer Architecture  
With "SC for DRF," programmers can get both the (relatively) simple correctness model of SC with the (relative) higher performance of XC.  ...  Beyond this intuition, the chapter formalizes SC and explores implementing SC with coherence in both simple and aggressive ways, culminating with a MIPS R10000 case study.  ... 
doi:10.2200/s00346ed1v01y201104cac016 fatcat:4hqyxplumrg2plqo77dm6jse2i

OmniOrder: Directory-based conflict serialization of transactions

Xuehai Qian, Benjamin Sahelices, Josep Torrellas
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
In an environment with SC enforcement with speculation, we run 11 programs that implement concurrent algorithms.  ...  Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency.  ...  Q can discard the SVB updates of the two earlier processors in any order -unlike for transaction commit, there is no need to do it in transaction order.  ... 
doi:10.1109/isca.2014.6853223 dblp:conf/isca/QianST14 fatcat:4thq2r6rcbhvxamkomyys73zwq

OmniOrder

Xuehai Qian, Benjamin Sahelices, Josep Torrellas
2014 SIGARCH Computer Architecture News  
In an environment with SC enforcement with speculation, we run 11 programs that implement concurrent algorithms.  ...  Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency.  ...  Q can discard the SVB updates of the two earlier processors in any order -unlike for transaction commit, there is no need to do it in transaction order.  ... 
doi:10.1145/2678373.2665734 fatcat:khqohcx6dnbavmqpov7zo2im7e

Mechanisms for store-wait-free multiprocessors

Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization.  ...  To eliminate ordering-related stalls, we propose atomic sequence ordering, which enforces ordering constraints over coarse-grain access sequences while relaxing order among individual accesses.  ...  Acknowledgements The authors would like to thank Milo Martin, members of the Impetus research group at Carnegie Mellon University, and the anonymous reviewers for their feedback on drafts of this paper  ... 
doi:10.1145/1250662.1250696 dblp:conf/isca/WenischAFM07 fatcat:rail7xodnjerrpgq4o4wpyntgi
« Previous Showing results 1 — 15 out of 5,683 results