Filters








3,102 Hits in 4.8 sec

Supporting highly-decoupled thread-level redundancy for parallel programs

M. Wasiur Rashid, Michael C. Huang
2008 High-Performance Computer Architecture  
In this paper, we propose a microarchitecture to efficiently support TLR for parallel codes.  ...  While TLR has been extensively studied in the context of single-threaded applications, much less attention is paid to the design issues and tradeoffs of supporting parallel codes.  ...  Acknowledgments We would like to sincerely thank the anonymous reviewers for their insightful comments and suggestions.  ... 
doi:10.1109/hpca.2008.4658655 dblp:conf/hpca/RashidH08 fatcat:5yz3jejqezfq5njrgf6a6a6gha

Soft-error mitigation by means of decoupled transactional memory threads

Daniel Sánchez, Juan M. Cebrián, José M. García, Juan L. Aragón
2014 Distributed computing  
Several studies have already proposed fault tolerance for parallel codes.  ...  Our initial version of LBRA executes these redundant threads in SMT cores.  ...  Acknowledgments Thanks to the anonymous reviewers for their comments and suggestions which definitely improved this work.  ... 
doi:10.1007/s00446-014-0215-6 fatcat:vdvalpdcwrbt3ofopss4nncl4q

Exploiting coarse-grain verification parallelism for power-efficient fault tolerance

M.W. Rashid, E.J. Tan, M.C. Huang, D.H. Albonesi
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
Our approach exploits the fact that with appropriate hardware support, the verification operation can be parallelized and run on a chip multiprocessor with support for frequency scaling together with supply  ...  However, the need for redundancy is directly opposed to the growing need for more power efficient operation.  ...  Providing Flexible and Efficient Thread-Level Redundancy In thread-level redundancy (TLR), the entire program thread is replicated and run under time or space redundancy.  ... 
doi:10.1109/pact.2005.20 dblp:conf/IEEEpact/RashidTHA05 fatcat:ao6f6potzbfkfhqxsx35kqwqdu

A log-based redundant architecture for reliable parallel computation

Daniel Sanchez, Juan L. Aragon, Jose M. Garcia
2010 2010 International Conference on High Performance Computing  
Several studies have been already proposed to provide fault tolerance for parallel codes.  ...  To this end, we propose LBRA based on a Hardware Transactional Memory (HTM) architecture in which two redundant threads successfully detects and recovers from transient faults, assuring a consistent view  ...  We would also like to thank Rubén Titos for his technical support and Antonio González who provided good suggestions of an earlier version of the manuscript.  ... 
doi:10.1109/hipc.2010.5713183 dblp:conf/hipc/SanchezAG10 fatcat:kpnvrt4mubghdnjz236szv36vu

Hardware support for software controlled multithreading

Aqeel Mahesri, Nicholas J. Wang, Sanjay J. Patel
2007 SIGARCH Computer Architecture News  
It divides a single thread of execution into multiple using the master-worker paradigm where some set of master threads execute code that spawns tasks for other, worker theads.  ...  Our proposal, NXA, is less speculative than previous proposals, relying heavily on software to guarantee thread correctness, though still allowing parallelism in the presence of ambiguous dependences.  ...  Further details on high level NXA concepts are provided in Section 2. We present an implementation of NXA to support master-worker threading all the way down to a fine granularity.  ... 
doi:10.1145/1241601.1241606 fatcat:aabcvs7w5vgw3o3bqtmgx3hdpa

Extending SRT for parallel applications in tiled-CMP architectures

D. Sanchez, J.L. Aragon, J.M. Garcia
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
We show how atomic operations induce a serialization point between master and slave threads. This bottleneck has an impact of 34% in execution speed for several parallel scientific benchmarks.  ...  Simultaneous and Redundantly Threaded (SRT) [13] is a fault tolerant architecture in which pairs of threads in a SMT core redundantly execute the same program instructions.  ...  Acknowledgements This work has been jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grant 05831/PI/07, also by the Spanish MEC and European  ... 
doi:10.1109/ipdps.2009.5160902 dblp:conf/ipps/SanchezAG09 fatcat:p5rknuu5ardtjb5h4lcsdmk2ne

REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs [chapter]

Daniel Sánchez, Juan L. Aragón, José M. García
2009 Lecture Notes in Computer Science  
We show how atomic operations induce to serialization points between master and slave threads. This bottleneck has an impact of 34% in execution time for several parallel scientific benchmarks.  ...  RMT (Redundant Multi-Threading) is a family of techniques based on SMT processors in which two independent threads (master and slave), fed with the same inputs, redundantly execute the same instructions  ...  This work has been jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grant 05831/PI/07, also by the Spanish MEC and European Commission FEDER  ... 
doi:10.1007/978-3-642-03869-3_32 fatcat:ijhk2vxqx5dubdyp4engtqkloq

SoC-C

Alastair D. Reid, Krisztian Flautner, Edmund Grimley-Evans, Yuan Lin
2008 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems - CASES '08  
The conventional approach to programming such hardware is very lowlevel but this yields software which is intimately and inseparably tied to the details of the platform it was originally designed for,  ...  We tackle this complexity with a set of language extensions which allows the programmer to introduce pipeline parallelism into sequential programs, manage distributed memories, and express the desired  ...  We identify the critical optimizations required to support the high level programming model.  ... 
doi:10.1145/1450095.1450112 dblp:conf/cases/ReidFGL08 fatcat:tb6x4kxeebfxvfbmcp72vawyqe

Architectural Support for Fault Tolerance in a Teradevice Dataflow System

Sebastian Weis, Arne Garbade, Bernhard Fechner, Avi Mendelson, Roberto Giorgi, Theo Ungerer
2014 International journal of parallel programming  
Furthermore, we exploit the dataflow execution model for a thread-level recovery scheme.  ...  and efficient thread-level synchronization mechanisms.  ...  Popovic for their initial studies on the DTA-C architecture and P. Faraboschi of HP for his precious suggestions and support on the COTSon simulator.  ... 
doi:10.1007/s10766-014-0312-y fatcat:kygdzmqyvrbonia2cu7n4glnsu

Speculative Decoupled Software Pipelining

Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, David I. August
2007 Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on  
By speculatively breaking these recurrences, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads.  ...  To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction.  ...  Acknowledgments We thank the entire Liberty Research Group for their support and feedback during this work. Additionally, we thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1109/pact.2007.4336199 fatcat:zdfi2dh3ujcsvlshzwhwcnejbi

StraightTaint: decoupled offline symbolic taint analysis

Jiang Ming, Dinghao Wu, Jun Wang, Gaoyao Xiao, Peng Liu
2016 Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016  
To ameliorate this performance bottleneck, recent research efforts aim to decouple data flow tracking logic from program execution.  ...  TaintPipe performs very lightweight runtime logging to produce compact control flow profiles, and spawns multiple threads as different stages of a pipeline to carry out symbolic taint analysis in parallel  ...  Acknowledgments We thank the Usenix Security anonymous reviewers and Niels Provos for their valuable feedback.  ... 
doi:10.1145/2970276.2970299 dblp:conf/kbse/MingWWXL16 fatcat:ex6z374qjrezhpzhxb4hif6fia

DAFT: Decoupled Acyclic Fault Tolerance

Yun Zhang, Jae W. Lee, Nick P. Johnson, David I. August
2011 International journal of parallel programming  
Where possible, values are speculated to be correct and only communicated to the redundant thread at essential program points.  ...  Redundant hardware modules can detect such faults, but software techniques are more appealing for their low cost and flexibility.  ...  Mukherjee et al. improved AR-SMT with Chip-level Redundant Threading (CRT), which uses a multicore chip for redundant execution and value checking [19].  ... 
doi:10.1007/s10766-011-0183-4 fatcat:5sfdnspbtbbsbcizffokwtvcv4

HELIX

Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads.  ...  We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and  ...  Acknowledgements Authors thank the anonymous reviewers for their hard work that allowed us to improve the paper significantly.  ... 
doi:10.1145/2259016.2259028 dblp:conf/cgo/CampanoniJHRWB12 fatcat:saxndpn5rvhodc7nsnfl7tjmxq

Scalable Speculative Parallelization on Commodity Clusters

Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, David I. August
2010 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture  
Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore  ...  For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49×.  ...  Acknowledgment We thank the Liberty Research Group for their support and feedback during this work. We also thank the anonymous reviewers for their insightful comments and suggestions.  ... 
doi:10.1109/micro.2010.19 dblp:conf/micro/KimRLLA10 fatcat:jj6chf5ucree7crvhqmscy7xae

Decoupled iteration mapping: improving dependency-loop performance on SIMD processors

Hui Yang, Shuming Chen, Jianghua Wan, Huanyao Dai
2013 IEICE Electronics Express  
difficult to parallelize and vectorize.  ...  Wide Single Instruction Multiple Data (SIMD) architectures are very important in the compute-intensive applications, but less efficient for applications with cross-iteration dependency loops which are  ...  Acknowledgments This work is partially supported by the National Natural Science Foundation of China (No.61070036 and 61301236).  ... 
doi:10.1587/elex.10.20130798 fatcat:qbqclhomnzbc7nv2hprkgbjerm
« Previous Showing results 1 — 15 out of 3,102 results