13,918 Hits in 4.2 sec

Extending the Wait-free Hierarchy to Multi-Threaded Systems

Matthieu Perrin, Achour Mostéfaoui, Grégoire Bonin
2020 Proceedings of the 39th Symposium on Principles of Distributed Computing  
This paper explores the synchronization power of shared objects in multi-threaded systems by extending the famous wait-free hierarchy to take these constraints into consideration.  ...  This makes it challenging to adapt some algorithms to multi-threaded systems, especially those that assign one shared register per process.  ...  ACKNOWLEDGMENTS This work was partially supported by the French ANR project 16-CE25-0005 O'Browser.  ... 
doi:10.1145/3382734.3405723 dblp:conf/podc/PerrinMB20 fatcat:v7dbsqlhdnh7tnzshikysvgczi

On the Computational Power of Shared Objects [chapter]

Gadi Taubenfeld
2009 Lecture Notes in Computer Science  
Then, we define the power hierarchy which is an infinite hierarchy of objects such that the objects at level i of the hierarchy are exactly those objects with power number i.  ...  Our equivalence and extended universality results, provide a deeper understanding of the nature of the relative computational power of shared objects.  ...  We prove that the wait-free hierarchy and power hierarchy are equivalent. That is, the consensus number of an object equals to its power number.  ... 
doi:10.1007/978-3-642-10877-8_22 fatcat:7athjoqe5ngcndui473koqkhse

Queue-Based and Adaptive Lock Algorithms for Scalable Resource Allocation on Shared-Memory Multiprocessors

Deli Zhang, Brendan Lynch, Damian Dechev
2014 International journal of parallel programming  
The challenge is for each thread to acquire exclusive access to desired resources while preventing deadlock or starvation.  ...  This work describes the first multi-resource lock algorithm that guarantees the strongest first-in, first-out fairness.  ...  The authors would also like to thank Dimitry Vyukov for providing insightful implementation tips on the non-blocking queue.  ... 
doi:10.1007/s10766-014-0317-6 fatcat:5waj5ideqfharlijwbbvpjid2q

Extensible control architectures

Greg Hoover, Forrest Brewer, Timothy Sherwood
2006 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems - CASES '06  
Traditional solutions to this problem require elaborate specifications that are difficult to maintain and extend.  ...  We present an overview of the our methodology including background on the pyPBS synthesis model, an architectural overview of our multi-threaded microcontroller, and implementation details for the control  ...  For an in-depth look at the processor architecture and impacts of multi-threading on embedded systems is available in [10] .  ... 
doi:10.1145/1176760.1176800 dblp:conf/cases/HooverBS06 fatcat:wosg2wj6uvbhxd6zt6s4np6pii

High performance locks for multi-level NUMA systems

Milind Chabbi, Michael Fagan, John Mellor-Crummey
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects.  ...  Two-level cohort locks perform better on NUMA systems, but fail to deliver top performance for deep NUMA hierarchies.  ...  Specifically, it used the Blacklight system at the Pittsburgh Supercomputing Center (PSC).  ... 
doi:10.1145/2688500.2688503 dblp:conf/ppopp/ChabbiFM15 fatcat:v7ub27suzjdc7e5twsl4ai2x3u

Efficient ray tracing of subdivision surfaces using tessellation caching

Carsten Benthin, Sven Woop, Matthias Nießner, Kai Selgard, Ingo Wald
2015 Proceedings of the 7th Conference on High-Performance Graphics - HPG '15  
Abstract A common way to ray trace subdivision surfaces is by constructing and traversing spatial hierarchies on top of tessellated input primitives.  ...  shading) on a high-end Intel R Xeon R processor system using our efficient lazy-build caching scheme.  ...  Acknowledgments We would like to thank James Jeffers and Eric Tabellion for their valuable feedback and guidance.  ... 
doi:10.1145/2790060.2790061 dblp:conf/egh/BenthinWNSW15 fatcat:fdagy6hryjhptfe5x6fcp5njbe

Efficient System-Enforced Deterministic Parallelism [article]

Amittai Aviram, Shu-Chun Weng, Sen Hu, Bryan Ford
2010 arXiv   pre-print
The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster.  ...  Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations.  ...  When each thread in a group arrives at a barrier, it calls Ret to stop and wait for the parent thread managing the group.  ... 
arXiv:1005.3450v1 fatcat:2zzzgu5q3vcltklzvmlc6b4ccq

A memory heterogeneity-aware runtime system for bandwidth-sensitive HPC applications

Kavitha Chandrasekar, Xiang Ni, Laxmikant V. Kale
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
We implement a data movement mechanism managed by the runtime system which allows applications to run efficiently on architectures with heterogeneous memory hierarchy, with trivial code changes.  ...  does not fit within the high bandwidth memory and data needs to be moved among different memory types.  ...  We plan to extend this implementation to other heterogeneous memory architectures. We will also perform comparisons with cache mode in KNL in the future and in multi-node cluster settings.  ... 
doi:10.1109/ipdpsw.2017.168 dblp:conf/ipps/ChandrasekarNK17 fatcat:tm2vcyo7qbe2rdtzimhfv2o47q

Towards a verified compiler prototype for the synchronous language SIGNAL

Zhibin Yang, Jean-Paul Bodeveix, Mamoun Filali, Kai Hu, Yongwang Zhao, Dianfu Ma
2015 Frontiers of Computer Science  
With the rising importance of multi-core processors in safety-critical embedded systems or cyber-physical systems (CPS), there is a growing need for model-driven generation of multi-threaded code and thus  ...  SIGNAL belongs to the synchronous languages family which are widely used in the design of safety-critical real-time systems such as avionics, space systems, and nuclear power plants.  ...  . / * T hread 2 * / void step() { wait(T hread1); i f (C1){ s 1 = f (y 1 ); s 2 = s 1 + 1; } noti f y(T hread4); } Mapping Multi-threaded Code to Multi-core To allow for static prediction of the system  ... 
doi:10.1007/s11704-015-4364-y fatcat:7x6wchqztrfmpassc3ip7eres4

Performance limitations of block-multithreaded distributed-memory systems

W.M. Zuberek
2009 Proceedings of the 2009 Winter Simulation Conference (WSC)  
Instruction-level multithreading is an architectural approach to tolerating such long latencies by switching instruction threads rather than waiting for the completion of memory operations.  ...  The performance of modern computer systems is increasingly often limited by long latencies of accesses to the memory subsystems.  ...  Memory hierarchies, and in particular multi-level cache memories, have been introduced to reduce the effective latency of memory accesses.  ... 
doi:10.1109/wsc.2009.5429718 dblp:conf/wsc/Zuberek09 fatcat:n4eccqcqkvcijimqez2s7uwfd4

Eliminating race conditions in system-level models by using parallel simulation infrastructure

Weiwei Chen, Che-Wei Chang, Xu Han, Rainer Domer
2012 2012 IEEE International High Level Design Validation and Test Workshop (HLDVT)  
In particular, the model must be free of race conditions in all accesses to shared variables, so that a safe parallel implementation is possible.  ...  Our experiments have revealed a number of dangerous race conditions in existing embedded multi-media application models and enabled us to efficiently and safely eliminate these hazards.  ...  ACKNOWLEDGMENT This work has been supported in part by funding from the National Science Foundation (NSF) under research grant NSF Award #0747523. The authors thank the NSF for the valuable support.  ... 
doi:10.1109/hldvt.2012.6418253 dblp:conf/hldvt/ChenCHD12 fatcat:jlmsyaxkcbcuvi32a54w4bbk5a

CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

Guoqing Lei, Yong Dou, Wen Wan, Fei Xia, Rongchun Li, Meng Ma, Dan Zou
2012 BMC Genomics  
The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.  ...  The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction.  ...  Acknowledgements We would like to thank the researchers who provided access, documentation and installation assistance for the ViennaRNA-1. 8  ... 
doi:10.1186/1471-2164-13-s1-s14 pmid:22369626 pmcid:PMC3303730 fatcat:7q352y2nvvfexaoyehdasvrwsq

Pushing the Limits of Parallel Discrete Event Simulation for SystemC [chapter]

Rainer Dömer, Zhongqi Cheng, Daniel Mendoza, Emad Arasteh
2020 A Journey of Embedded and Cyber-Physical Systems  
By localizing the simulation time to individual threads and carefully handling events at different times, the simulator engine can issue threads in parallel and ahead of time, following a partial ordering  ...  The Accellera Systems Initiative maintains not only the official SystemC language definition, but also provides an open source proof-of-concept library that can be used to simulate SystemC design models  ...  The authors thank Intel Corporation for the valuable support.  ... 
doi:10.1007/978-3-030-47487-4_7 fatcat:ud3zjmdwobcxtf3yayd3u27ulm

Cache Line Aware Optimizations for ccNUMA Systems

Sabela Ramos, Torsten Hoefler
2015 Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '15  
Current shared memory systems utilize complex memory hierarchies to maintain scalability when increasing the number of processing units.  ...  We propose to expose the block-based design of caches in parallel computers to middleware designers to allow semi-automatic performance tuning with the systematic translation from algorithms to an analytic  ...  This work was supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (Project TIN2013-42148-P).  ... 
doi:10.1145/2749246.2749256 dblp:conf/hpdc/RamosH15 fatcat:ip43wafnwvg6rn2nj32puoauii

Supporting High Level Language Semantics within Hardware Resident Threads

Erik Anderson, Wesley Peck, Jim Stevens, Jason Agron, Fabrice Baijot, Seth Warn, David Andrews
2007 2007 International Conference on Field Programmable Logic and Applications  
The HWTI provides a hardware thread with the same hthread system calls available to software threads, a fast global distributed memory, support for pointers, a generalized function call model including  ...  The paper presents the new Hardware Thread Interface (HWTI), a meaningful and semantic rich target for a high level language to hardware descriptive language translator.  ...  Acknowledgment The work in this article is partially sponsored by National Science Foundation EHS contract CCR-0311599.  ... 
doi:10.1109/fpl.2007.4380632 dblp:conf/fpl/AndersonPSABWA07 fatcat:q3srlzb4rvhbpjnu4xzmrrebvm
« Previous Showing results 1 — 15 out of 13,918 results