Filters








38 Hits in 5.7 sec

Exploiting reference idempotency to reduce speculative storage overflow

Seon Wook Kim, Chong-Liang Ooi, Rudolf Eigenmann, Babak Falsafi, T. N. Vijaykumar
2006 ACM Transactions on Programming Languages and Systems  
Thus, we reduce the demand for speculative storage space in large threads. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model.  ...  The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the speculative storage capacity.  ...  Thus, idempotent references help reduce speculative storage overflow, as motivated in Section 1.  ... 
doi:10.1145/1152649.1152653 fatcat:scnsl7qk4zbc3hxorpnahcveie

Idempotent processor architecture

Marc de Kruijf, Karthikeyan Sankaralingam
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
The paradigm of executing idempotent regions, which we call idempotent processing, can be used to support various types of speculation, including branch prediction, dependence prediction, or execution  ...  This paper presents a new processor architecture, the idempotent processor architecture, that advances both of these directions by presenting a new execution paradigm that allows speculative execution  ...  The concept of idempotence has been leveraged by Kim et al. in the context of thread-level speculation to allow idempotent references to see speculative state [30] .  ... 
doi:10.1145/2155620.2155637 dblp:conf/micro/KruijfS11 fatcat:cktbx6nww5gavorpghak76g72e

Encore

Shuguang Feng, Shantanu Gupta, Amin Ansari, Scott A. Mahlke, David I. August
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
Encore combines program analysis, profile data, and simple code transformations to create statistically idempotent code regions that can recover from faults at very little cost.  ...  To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions.  ...  [9] leveraged idempotent properties of inner loops in Fortran applications to minimize the instances of storage overflows in a speculative execution system.  ... 
doi:10.1145/2155620.2155667 dblp:conf/micro/FengGAMA11 fatcat:uoanodtx4zhslgdz6ygwbykjje

Compiler Directed Speculative Intermittent Computation [article]

Jongouk Choi, Qingrui Liu, Changhee Jung
2020 arXiv   pre-print
When the program control reaches the end of each region, the speculation turns out to be successful, thus releasing all the buffered stores of the region to NVM.  ...  To achieve crash consistency without requiring unconventional architectural support, CoSpec leverages speculation assuming that power failure is not going to occur and thus holds all committed stores in  ...  To achieve lightweight crash consistency, CoSpec proposes to exploit such a store buffer (SB) for a different type of speculation.  ... 
arXiv:2006.11479v1 fatcat:iksj4lk5qbdbtcplf7f33wwfnm

A lightweight in-place implementation for software thread-level speculation

Cosmin E. Oancea, Alan Mycroft, Tim Harris
2009 Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures - SPAA '09  
TLS implementations can impose large storage overheads caused by buffering speculative work.  ...  We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations.  ...  However they involve non-trivial and expensive changes to the basic cache-coherence infrastructure and avoiding overflow of the limited speculative storage can restrict gains [11] .  ... 
doi:10.1145/1583991.1584050 dblp:conf/spaa/OanceaMH09 fatcat:j337tnmw2neyxpd5tyzlxypc2m

Hardware tansactional memory support for lightweight dynamic language evolution

Nicholas Riley, Craig Zilles
2006 Companion to the 21st ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications - OOPSLA '06  
We exploit the semantics of Python execution to evaluate individual bytecodes atomically by default, using nested transactions to emulate programmerspecified locking constructs where possible in existing  ...  With small changes, a runtime can be made HTM-aware to enable parallel execution of Python code and extension modules.  ...  We would like to thank Carl Friedrich Bolz, Michael Hudson, Samuele Pedroni and Armin Rigo for their assistance with PyPy.  ... 
doi:10.1145/1176617.1176758 dblp:conf/oopsla/RileyZ06 fatcat:ghn7hmicevdh7plv7yosemiddi

Thread-safe dynamic binary translation using transactional memory

JaeWoong Chung, Michael Dalton, Hari Kannan, Christos Kozyrakis
2008 High-Performance Computer Architecture  
We also show that software optimizations in the DBT and hardware support for transactions can reduce the runtime overhead to 6%.  ...  However, DBT frameworks may incorrectly handle multithreaded programs due to races involving updates to the application data and the corresponding metadata maintained by the DBT.  ...  For reference, we also include the original results with software-only transactions. STM+ reduces the average overhead from 41% to 28%.  ... 
doi:10.1109/hpca.2008.4658646 dblp:conf/hpca/ChungDKK08 fatcat:tgpdkrlqzfc5vl74d3r5bfghry

Transactional Mutex Locks [chapter]

Luke Dalessandro, Dave Dice, Michael Scott, Nir Shavit, Michael Spear
2010 Lecture Notes in Computer Science  
In this paper we propose transactional mutex locks (TML), which attempt to achieve the best of both worlds for read-dominated workloads.  ...  TML has much lower latency than STM, enabling it to perform competitively with mutexes. It also scales as well as STM when critical sections rarely perform writes.  ...  We are also grateful to the UR Center for Research Computing for maintaining and providing access to a cluster of 8-core x86 machines.  ... 
doi:10.1007/978-3-642-15291-7_2 fatcat:3qjh55q33rgw3hflygkqqeqyd4

P-Ray: A Software Suite for Multi-core Architecture Characterization [chapter]

Alexandre X. Duchateau, Albert Sidelnik, María Jesús Garzarán, David Padua
2008 Lecture Notes in Computer Science  
The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers.  ...  Currently, the task of determining the appropriate memory to use and the coding of data transfer between memories is still left to the programmer.  ...  We illustrated the methodology by implementing a matrix-matrix multiplication algorithm that can exploit the availability of PBLAS and BLAS when the proper conditions are met.  ... 
doi:10.1007/978-3-540-89740-8_13 fatcat:hv2aoouhcve4xc2vlff77k7q4i

Hints and Principles for Computer System Design [article]

Butler Lampson
2021 arXiv   pre-print
It also gives some principles for system design that are more than just hints, and many examples of how to apply the ideas.  ...  Exploit batching to reduce the per item cost.  ...  The reasons are to do less total work (a form of speculation) or to reduce latency.  ... 
arXiv:2011.02455v3 fatcat:jolyz5lknjdbpjpxjcrx5rh6fa

Secure System Virtualization: End-to-End Verification of Memory Isolation [article]

Hamed Nemati
2020 arXiv   pre-print
They reduce the software portion of the system's trusted computing base to a thin layer, which enforces isolation between low- and high-criticality components.  ...  The reduced trusted computing base minimizes the system attack surface and facilitates the use of formal methods to ensure functional correctness and security of the kernel.  ...  overflows [220] .  ... 
arXiv:2005.02605v1 fatcat:h7sdyjoxyrexhaswjns5mcfdey

The Next 7000 Programming Languages [chapter]

Robert Chatley, Alastair Donaldson, Alan Mycroft
2019 Lecture Notes in Computer Science  
Landin's seminal paper "The next 700 programming languages" considered programming languages prior to 1966 and speculated on the next 700.  ...  We conclude by speculating on future language evolution.  ...  We are grateful to Sophia Drossopoulou, Stephen Kell, Tom Stuart, Joost-Pieter Katoen, Flemming Nielson and Bernhard Steffen for their useful feedback on an earlier draft of this work.  ... 
doi:10.1007/978-3-319-91908-9_15 fatcat:kympenwph5ajjg2ilydix423he

Big Graph Analytics Platforms

Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande
2017 Foundations and Trends in Databases  
"messages") to local disk(s) if the in-memory buffer overflows.  ...  As a tradeoff, the files have to be unzipped for processing, but the storage savings and the reduced amount of data transfer outweigh the decompression overhead.  ...  ., to perform v out ← A T · v in . In this case, only those columns of A T that correspond to the adjacency lists of vertices in S need to be accessed.  ... 
doi:10.1561/1900000056 fatcat:ucqrtzo4q5g2lpj6dmp7jv3e5m

Crafty: Efficient, HTM-Compatible Persistent Transactions [article]

Kaan Genç Ohio State University
2020 arXiv   pre-print
Byte-addressable persistent memory, such as Intel/Micron 3D XPoint, is an emerging technology that bridges the gap between volatile memory and persistent storage.  ...  Existing approaches incur significant performance costs to ensure crash consistency.  ...  Thanks to Steve Blackburn, Jake Roemer, and Tomoharu Ugawa for helpful discussions and feedback.  ... 
arXiv:2004.00262v1 fatcat:kwgly52fmnerdgfcghgzgnrc24

Is Parallel Programming Hard, And, If So, What Can You Do About It? (Release v2021.12.22a) [article]

Paul E. McKenney
2021 arXiv   pre-print
Your mission, if you choose to accept, is to help make further progress in the exciting field of parallel programming-progress that will in time render this book obsolete.  ...  The purpose of this book is to help you program shared-memory parallel systems without risking your sanity.  ...  The following section shows one way to greatly increase the time required for overflow to occur, while greatly reducing read-side overhead.  ... 
arXiv:1701.00854v4 fatcat:pxiajyczebd5pm76htwnrczhm4
« Previous Showing results 1 — 15 out of 38 results