Filters








1,810 Hits in 3.5 sec

Verification of producer-consumer synchronization in GPU programs

Rahul Sharma, Michael Bauer, Alex Aiken
2015 Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2015  
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model.  ...  No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware.  ...  Department of Energy under Contract No. DE-AC52-06NA25396, and NSF grant CCF-1160904.  ... 
doi:10.1145/2737924.2737962 dblp:conf/pldi/SharmaBA15 fatcat:r53doygmrrexdketgm6juge44y

Verification of producer-consumer synchronization in GPU programs

Rahul Sharma, Michael Bauer, Alex Aiken
2015 SIGPLAN notices  
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model.  ...  No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware.  ...  Department of Energy under Contract No. DE-AC52-06NA25396, and NSF grant CCF-1160904.  ... 
doi:10.1145/2813885.2737962 fatcat:pqgc4ciworchfd7a3njooiienu

Accelerating SystemC simulations using GPUs

Mahesh Nanjundappa, Anirudh Kaushik, Hiren D. Patel, Sandeep K. Shukla
2012 2012 IEEE International High Level Design Validation and Test Workshop (HLDVT)  
Recent developments in graphics processing unit (GPU) technology has invigorated an interest in using GPUs for accelerating the simulation of SystemC models.  ...  In this paper, we present a summary of these recent research efforts that propose using GPUs for accelerating SystemC simulation.  ...  It has been reported that 70% of the time is spent in the validation and verification phase of the design cycle [1] where verification by simulation is used.  ... 
doi:10.1109/hldvt.2012.6418255 dblp:conf/hldvt/NanjundappaKPS12 fatcat:ui7jiamsvrhwvjxy4o4ghijn3a

Lazy Parallel Kronecker Algebra-Operations on Heterogeneous Multicores [chapter]

Wasuwee Sodsong, Robert Mittermayr, Yoojin Park, Bernd Burgstaller, Johann Blieberger
2017 Lecture Notes in Computer Science  
It has been observed in prior, unpublished work [10] that it is not necessary to compute adjacency matrices in their entirety: the use of synchronization constructs in a multi-threaded program induces  ...  Kronecker algebra operations have been devised that are able to capture the constraints on possible thread interleavings resulting from semaphore-based producer-consumer synchronization and from mutual  ...  The relevant background on Kronecker algebra is discussed in Sect. 2. We provide an overview of our execution scheme in Sect. 3. Our multicore CPU and GPU implementations are discussed in Sect. 4.  ... 
doi:10.1007/978-3-319-64203-1_39 fatcat:arifkipj6veshocqqlosfj2siq

Correct and Efficient Accelerator Programming (Dagstuhl Seminar 13142)

Albert Cohen, Alastair F. Donaldson, Marieke Huisman, Joost-Pieter Katoen, Marc Herbstritt
2013 Dagstuhl Reports  
Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices.  ...  In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users.  ...  a larger role in GPU kernel verification.  ... 
doi:10.4230/dagrep.3.4.17 dblp:journals/dagstuhl-reports/CohenDHK13 fatcat:4qiimr6nwfdibcj6nmhw4ml2pu

Value Prediction and Speculative Execution on GPU

Shaoshan Liu, Christine Eisenbeis, Jean-Luc Gaudiot
2010 International journal of parallel programming  
In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism  ...  The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5-8%) and power consumption (1-5%) overheads  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided  ... 
doi:10.1007/s10766-010-0155-0 fatcat:bgmqj7ojxveihbv3cqzpunlihq

PaRV: Parallelizing Runtime Detection and Prevention of Concurrency Errors [chapter]

Ismail Kuru, Hassan Salehe Matar, Adrián Cristal, Gokcen Kestor, Osman Unsal
2013 Lecture Notes in Computer Science  
We present the PaRV tool for runtime detection of and recovery from data races in multi-threaded C and C++ programs.  ...  PaRV uses transactional memory technology for parallelizing runtime verification and for buffering write accesses during race checking.  ...  The communication channel is implemented with a single-producer/single-consumer, circular, lock-free queue where the application thread (producer) posts read and write messages that the auxiliary thread  ... 
doi:10.1007/978-3-642-35632-2_6 fatcat:abceyf5mwbe2tf4sgp22n7f5b4

SystemC simulation on GP-GPUs

Nicola Bombieri, Sara Vinco, Valeria Bertacco, Debapriya Chatterjee
2012 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '12  
Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs.  ...  Our solution leverages static scheduling to reduce synchronization overheads.  ...  In that work, independent SystemC processes are mapped into parallel threads that synchronize at each iteration of a delta cycle, through a barrier synchronization, to maintain the correct producer-consumer  ... 
doi:10.1145/2380445.2380500 dblp:conf/codes/BombieriVBC12 fatcat:2wjf7g2hj5da3cqt7kbv5m32my

Teaching Parallel and Distributed Computing to Undergraduate Computer Science Students

Marcelo Arroyo
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
Parallel and distributed systems programming skills has become a common requirement in the development of modern applications.  ...  Finally, we describe the use of existing tools and the development of new high level tools, as parallel patterns, useful for teaching parallel programming which can be used in different courses.  ...  Where is is an input stream (producer) os is an output stream (consumer) se is a stream expression (using streams input values) an expression on each element on its GPU.  ... 
doi:10.1109/ipdpsw.2013.276 dblp:conf/ipps/Arroyo13 fatcat:dqtxmbw4o5b3bldjsrrunjwinm

Automated Verification of Functional Correctness of Race-Free GPU Programs [chapter]

Kensuke Kojima, Akifumi Imanishi, Atsushi Igarashi
2016 Lecture Notes in Computer Science  
We study an automated verification method for functional correctness of parallel programs running on GPUs. Our method is based on Kojima and Igarashi's Hoare logic for GPU programs.  ...  It is often impossible, however, to solve naively generated VCs in reasonable time. A main difficulty stems from quantifiers over threads due to the parallel nature of GPU programs.  ...  In this paper we study an automated verification technique for functional correctness of GPU programs.  ... 
doi:10.1007/978-3-319-48869-1_7 fatcat:6pdczayppveixeasdal56iqusu

Automated Verification of Functional Correctness of Race-Free GPU Programs

Kensuke Kojima, Akifumi Imanishi, Atsushi Igarashi
2017 Journal of automated reasoning  
We study an automated verification method for functional correctness of parallel programs running on GPUs. Our method is based on Kojima and Igarashi's Hoare logic for GPU programs.  ...  It is often impossible, however, to solve naively generated VCs in reasonable time. A main difficulty stems from quantifiers over threads due to the parallel nature of GPU programs.  ...  In this paper we study an automated verification technique for functional correctness of GPU programs.  ... 
doi:10.1007/s10817-017-9428-2 fatcat:pf2ixforx5c7jlutaumgjozyte

Real-world design and evaluation of compiler-managed GPU redundant multithreading

Jack Wadden, Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan, Kevin Skadron
2014 SIGARCH Computer Architecture News  
We then perform detailed power and performance evaluations of three RMT algorithms, each of which provides fault coverage to a set of structures in the GPU.  ...  Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in the construction of reliable supercomputer systems.  ...  verification capabilities.  ... 
doi:10.1145/2678373.2665686 fatcat:sp5qrkfdzzfhrg5pho3oo6wjne

Real-world design and evaluation of compiler-managed GPU redundant multithreading

Jack Wadden, Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan, Kevin Skadron
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
We then perform detailed power and performance evaluations of three RMT algorithms, each of which provides fault coverage to a set of structures in the GPU.  ...  Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in the construction of reliable supercomputer systems.  ...  verification capabilities.  ... 
doi:10.1109/isca.2014.6853227 dblp:conf/isca/WaddenLGSS14 fatcat:ccshcrfm5rembof45ujamghcoy

Analysis of GPGPU Programs for Data-race and Barrier Divergence

Santonu Sarkar, Prateek Kandelwal, Soumyadip Bandyopadhyay, Holger Giese
2018 Proceedings of the 13th International Conference on Software Technologies  
We present a technique to identify the existence of these properties in a CUDA program using a static property verification method.  ...  In this paper, we focus on the two important properties of the programs written for GPGPUs, namely i) the data-race conditions and ii) the barrier divergence.  ...  ACKNOWLEDGEMENTS This work is financially supported by Science and Engineering Research Board, Govt. of India funding (SB/S3/EECE/0170/2014).  ... 
doi:10.5220/0006834904940505 dblp:conf/icsoft/SarkarKBG18 fatcat:5hr4xlpyrvg4tpx7eksp7liqsy

High performance gate-level simulation with GP-GPU computing

Valeria Bertacco, Debapriya Chatterjee
2011 Proceedings of 2011 International Symposium on VLSI Design, Automation and Test  
Functional verification of modern digital designs is a mission critical and time-consuming task.  ...  Recently, advances in graphic processing unit (GPU) technology has made GPUs emerge as a cost effective parallel processing solution.  ...  A compilation step produces macro-gates, which are optimized and offloaded to the GPU for simulation.  ... 
doi:10.1109/vdat.2011.5783577 fatcat:myt3bsnwqnfhjim4q4po3idxui
« Previous Showing results 1 — 15 out of 1,810 results