Filters








460 Hits in 13.0 sec

In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory [article]

William Kuszmaul, Alek Westover
2020 arXiv   pre-print
The algorithm uses only exclusive read/write shared variables, and can be implemented using parallel-for-loops without any additional concurrency considerations (i.e., the algorithm is EREW).  ...  We present an in-place algorithm for the partition problem that has linear work and polylogarithmic span.  ...  Finding fast algorithms that use only exclusive-read-write memory (or concurrent-read-exclusive-write memory) is an important direction of future work.  ... 
arXiv:2004.12532v2 fatcat:vklij2qznza3ti6x57zghdfslu

Engineering In-place (Shared-memory) Sorting Algorithms [article]

Michael Axtmann, Sascha Witt, Daniel Ferizovic, Peter Sanders
2021 arXiv   pre-print
Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient.  ...  By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best in-place parallel comparison-based  ...  Timo Bingmann and Lorenz Hübschle-Schneider [39] kindly provided an initial implementation of the branchless decision tree that was used as a starting point in our implementation.  ... 
arXiv:2009.13569v2 fatcat:epl3sxbj3bdkxcpzwyj4wghup4

Transactional Memory, 2nd edition

Tim Harris, James Larus, Ravi Rajwar
2010 Synthesis Lectures on Computer Architecture  
The Greedy CM provides provably good performance when compared with an optimal schedule.  ...  In contrast, for the exclusive state (E), an address is exclusively present in the data cache and is not present in any other data cache.  ...  WWT was a DARPA and NSF-funded project investigated new approaches to simulating, building, and programming parallel shared-memory computers.  ... 
doi:10.2200/s00272ed1v01y201006cac011 fatcat:25d3gvp5zrfqlgpzdzknqouofi

External memory algorithms and data structures: dealing with massive data

Jeffrey Scott Vitter
2001 ACM Computing Surveys  
The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however,  ...  In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs  ...  Figures 6, 7, 9 , 10, 12, and 13 are modified versions of figures in Arge et al. [1998a] Enbody and Du [1998], Kanellakis et al. [1996] , Arge et al. [1999a,b], and Ferragina and Grossi [1999] , respectively  ... 
doi:10.1145/384192.384193 fatcat:tunz4fa3rrgv7hwbk7qsvahd5i

Algorithms and Data Structures for External Memory

Jeffrey Scott Vitter
2006 Foundations and Trends® in Theoretical Computer Science  
Algorithms and Data Structures for External Memory is an invaluable reference for anybody interested in, or conducting research in the design, analysis, and implementation of algorithms and data structures  ...  Algorithms and Data Structures for External Memory describes several useful paradigms for the design and implementation of efficient EM algorithms and data structures.  ...  In order to apply duality, which deals with read and write sequences, we need to predetermine the read order Σ for the merge.  ... 
doi:10.1561/0400000014 fatcat:bkfchugd4fbjvcu5zipnh23k6e

Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems

Björn B. Brandenburg, James H. Anderson
2009 2009 21st Euromicro Conference on Real-Time Systems  
A new phase-fair reader-writer lock is proposed as an alternative that significantly reduces worstcase blocking for readers and an efficient local-spin implementation is provided.  ...  Both task-and phase-fair locks are evaluated and contrasted to mutex locks in terms of hard and soft real-time schedulability under consideration of runtime overheads on a multicore computer.  ...  With locks, an update can be computed based on the current value. algorithm, wherein the earliest-deadline-first (EDF) algorithm is used on each processor, and in the global case, the global EDF (G-EDF  ... 
doi:10.1109/ecrts.2009.14 dblp:conf/ecrts/BrandenburgA09 fatcat:3hxc73dm6fb4pkvnvmi4x24fki

Distributed computing column 54 transactional memory: models and algorithms

Jennifer L. Welch
2014 ACM SIGACT News  
This issue's column consists of a review article by Gokarna Sharma and Costas Busch on models and algorithms for transactional memory (TM), with particular emphasis on scheduling.  ...  Third, related results for non-uniform memory access systems are surveyed, with emphasis on how to provide consistency in a load-balanced way.  ...  We compared the performance of MultiBend with existing algorithms Arrow [25] and Ballistic [46] .  ... 
doi:10.1145/2636805.2636823 fatcat:xnjbxycmfbfkdop6eyzrbbj4nq

HeTM: Transactional Memory for Heterogeneous Systems [article]

Daniel Castro, Paolo Romano, Aleksandar Ilic, Amin M. Khan
2019 arXiv   pre-print
We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.  ...  HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions.  ...  equipped with their own local memory and communicate over an interconnection bus like PCIe.  ... 
arXiv:1905.00661v2 fatcat:nxiihazahrc3xnyptkqo35ke3e

The MT Stack: Paging Algorithm and Performance in a Distributed Virtual Memory System

Marco T. Morazan, Douglas R. Troeger, Myles Nash
2018 CLEI Electronic Journal  
In order to design and im- plement parallel functional languages efficiently, we propose the development of an all-software based distributed virtual memory system de- signed specifically for the memory  ...  Based on this proof the MT stack page replacement policy was de- veloped and implemented. We outline the paging algorithm and present an argument of partial cor- rectness.  ...  Thus, the re-sults pertain exclusively to stack behavior. Benchmarks Used A brief description of the benchmarks used is given below.  ... 
doi:10.19153/cleiej.5.1.2 fatcat:ecq2q5kmjzgz3nl6tqdvzbia4u

On the Cost of Concurrency in Transactional Memory [article]

Srivatsan Ravi
2015 arXiv   pre-print
The Transactional Memory (TM) abstraction is proposed as such a mechanism: it intends to combine an easy-to-use programming interface with an efficient utilization of the concurrent-computing abilities  ...  Traditional techniques for synchronization are based on locking that provides threads with exclusive access to shared data.  ...  We match this lower bound with an HyTM algorithm that, additionally, allows for uninstrumented writes and invisible reads and is provably opaque [64] .  ... 
arXiv:1511.01779v1 fatcat:ahlaq4z7dffklp7fiytspfmbti

Secure System Virtualization: End-to-End Verification of Memory Isolation [article]

Hamed Nemati
2020 arXiv   pre-print
In this thesis, we explore various aspects of building a provably secure separation kernel using virtualization technology.  ...  In particular, we examine techniques related to the appropriate management of the memory subsystem.  ...  Here, touch r|w t denotes a read or write access to a line tagged with t, lfill t occurs when a line for tag t is loaded from memory and placed in the cache.  ... 
arXiv:2005.02605v1 fatcat:h7sdyjoxyrexhaswjns5mcfdey

Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?

P. B. Gibbons, Y. Matias
1999 Theory of Computing Systems  
The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model.  ...  Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention  ...  the well-studied exclusive-read exclusive-write (EREW) or concurrent-read concurrent-write (CRCW) rules.  ... 
doi:10.1007/s002240000121 fatcat:6q5ler4h5nb2jjlzxdsz25hylm

Can shared-memory model serve as a bridging model for parallel computation?

Phillip B. Gibbons, Yossi Matias, Vijaya Ramachandran
1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures - SPAA '97  
The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model.  ...  Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention  ...  the well-studied exclusive-read exclusive-write (EREW) or concurrent-read concurrent-write (CRCW) rules.  ... 
doi:10.1145/258492.258500 dblp:conf/spaa/GibbonsMR97 fatcat:gkvh5eocujhdtcqlrjqouyudt4

On the Cost of Concurrency in Transactional Memory [chapter]

Petr Kuznetsov, Srivatsan Ravi
2011 Lecture Notes in Computer Science  
Eidesstattliche Erklärung Ich versichere an Eides statt, dass ich diese Dissertation selbständig verfasst und nur die angegebenen Quellen und Hilfsmittel verwendet habe.  ...  Schlussendlich beweisen wir, dass optimistische, auf spekulativen Ausführungen basierende, Synchronisierungstechniken, in einem präzisen Sinne, besser geeignet sind um Nebenläugkeit auszunutzen als pessimistische  ...  this observation, our HyTM implementation described in Algorithm 7.2 overcomes the linear per-read instrumentation cost by allowing hardware readers to abort due to a concurrent software writer, but maintains  ... 
doi:10.1007/978-3-642-25873-2_9 fatcat:7ehtxssf45abzc2bnigccm6evq

P-Ray: A Software Suite for Multi-core Architecture Characterization [chapter]

Alexandre X. Duchateau, Albert Sidelnik, María Jesús Garzarán, David Padua
2008 Lecture Notes in Computer Science  
However, there are still many burdens placed upon the programmer to maximize performance when using CUDA. One such burden is dealing with the complex memory hierarchy.  ...  Efficient and correct usage of the various memories is essential, making a difference of 2-17x in performance.  ...  Chen Ding suggested the lock switching scheme in the memory controller component; Brian Meeker collected some preliminary data in the early stage of this research.  ... 
doi:10.1007/978-3-540-89740-8_13 fatcat:hv2aoouhcve4xc2vlff77k7q4i
« Previous Showing results 1 — 15 out of 460 results