A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Verification of producer-consumer synchronization in GPU programs
2015
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2015
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. ...
No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. ...
Department of Energy under Contract No. DE-AC52-06NA25396, and NSF grant CCF-1160904. ...
doi:10.1145/2737924.2737962
dblp:conf/pldi/SharmaBA15
fatcat:r53doygmrrexdketgm6juge44y
Verification of producer-consumer synchronization in GPU programs
2015
SIGPLAN notices
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. ...
No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. ...
Department of Energy under Contract No. DE-AC52-06NA25396, and NSF grant CCF-1160904. ...
doi:10.1145/2813885.2737962
fatcat:pqgc4ciworchfd7a3njooiienu
Accelerating SystemC simulations using GPUs
2012
2012 IEEE International High Level Design Validation and Test Workshop (HLDVT)
Recent developments in graphics processing unit (GPU) technology has invigorated an interest in using GPUs for accelerating the simulation of SystemC models. ...
In this paper, we present a summary of these recent research efforts that propose using GPUs for accelerating SystemC simulation. ...
It has been reported that 70% of the time is spent in the validation and verification phase of the design cycle [1] where verification by simulation is used. ...
doi:10.1109/hldvt.2012.6418255
dblp:conf/hldvt/NanjundappaKPS12
fatcat:ui7jiamsvrhwvjxy4o4ghijn3a
Lazy Parallel Kronecker Algebra-Operations on Heterogeneous Multicores
[chapter]
2017
Lecture Notes in Computer Science
It has been observed in prior, unpublished work [10] that it is not necessary to compute adjacency matrices in their entirety: the use of synchronization constructs in a multi-threaded program induces ...
Kronecker algebra operations have been devised that are able to capture the constraints on possible thread interleavings resulting from semaphore-based producer-consumer synchronization and from mutual ...
The relevant background on Kronecker algebra is discussed in Sect. 2. We provide an overview of our execution scheme in Sect. 3. Our multicore CPU and GPU implementations are discussed in Sect. 4. ...
doi:10.1007/978-3-319-64203-1_39
fatcat:arifkipj6veshocqqlosfj2siq
Correct and Efficient Accelerator Programming (Dagstuhl Seminar 13142)
2013
Dagstuhl Reports
Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices. ...
In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users. ...
a larger role in GPU kernel verification. ...
doi:10.4230/dagrep.3.4.17
dblp:journals/dagstuhl-reports/CohenDHK13
fatcat:4qiimr6nwfdibcj6nmhw4ml2pu
Value Prediction and Speculative Execution on GPU
2010
International journal of parallel programming
In this paper, we explore the possibility of using GPUs for value prediction and speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism ...
The results indicate that the hardware extensions result in almost tenfold reduction of the control divergent sequential operations with only moderate hardware (5-8%) and power consumption (1-5%) overheads ...
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided ...
doi:10.1007/s10766-010-0155-0
fatcat:bgmqj7ojxveihbv3cqzpunlihq
PaRV: Parallelizing Runtime Detection and Prevention of Concurrency Errors
[chapter]
2013
Lecture Notes in Computer Science
We present the PaRV tool for runtime detection of and recovery from data races in multi-threaded C and C++ programs. ...
PaRV uses transactional memory technology for parallelizing runtime verification and for buffering write accesses during race checking. ...
The communication channel is implemented with a single-producer/single-consumer, circular, lock-free queue where the application thread (producer) posts read and write messages that the auxiliary thread ...
doi:10.1007/978-3-642-35632-2_6
fatcat:abceyf5mwbe2tf4sgp22n7f5b4
SystemC simulation on GP-GPUs
2012
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '12
Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs. ...
Our solution leverages static scheduling to reduce synchronization overheads. ...
In that work, independent SystemC processes are mapped into parallel threads that synchronize at each iteration of a delta cycle, through a barrier synchronization, to maintain the correct producer-consumer ...
doi:10.1145/2380445.2380500
dblp:conf/codes/BombieriVBC12
fatcat:2wjf7g2hj5da3cqt7kbv5m32my
Teaching Parallel and Distributed Computing to Undergraduate Computer Science Students
2013
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum
Parallel and distributed systems programming skills has become a common requirement in the development of modern applications. ...
Finally, we describe the use of existing tools and the development of new high level tools, as parallel patterns, useful for teaching parallel programming which can be used in different courses. ...
Where is is an input stream (producer) os is an output stream (consumer) se is a stream expression (using streams input values) an expression on each element on its GPU. ...
doi:10.1109/ipdpsw.2013.276
dblp:conf/ipps/Arroyo13
fatcat:dqtxmbw4o5b3bldjsrrunjwinm
Automated Verification of Functional Correctness of Race-Free GPU Programs
[chapter]
2016
Lecture Notes in Computer Science
We study an automated verification method for functional correctness of parallel programs running on GPUs. Our method is based on Kojima and Igarashi's Hoare logic for GPU programs. ...
It is often impossible, however, to solve naively generated VCs in reasonable time. A main difficulty stems from quantifiers over threads due to the parallel nature of GPU programs. ...
In this paper we study an automated verification technique for functional correctness of GPU programs. ...
doi:10.1007/978-3-319-48869-1_7
fatcat:6pdczayppveixeasdal56iqusu
Automated Verification of Functional Correctness of Race-Free GPU Programs
2017
Journal of automated reasoning
We study an automated verification method for functional correctness of parallel programs running on GPUs. Our method is based on Kojima and Igarashi's Hoare logic for GPU programs. ...
It is often impossible, however, to solve naively generated VCs in reasonable time. A main difficulty stems from quantifiers over threads due to the parallel nature of GPU programs. ...
In this paper we study an automated verification technique for functional correctness of GPU programs. ...
doi:10.1007/s10817-017-9428-2
fatcat:pf2ixforx5c7jlutaumgjozyte
Real-world design and evaluation of compiler-managed GPU redundant multithreading
2014
SIGARCH Computer Architecture News
We then perform detailed power and performance evaluations of three RMT algorithms, each of which provides fault coverage to a set of structures in the GPU. ...
Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in the construction of reliable supercomputer systems. ...
verification capabilities. ...
doi:10.1145/2678373.2665686
fatcat:sp5qrkfdzzfhrg5pho3oo6wjne
Real-world design and evaluation of compiler-managed GPU redundant multithreading
2014
2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
We then perform detailed power and performance evaluations of three RMT algorithms, each of which provides fault coverage to a set of structures in the GPU. ...
Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in the construction of reliable supercomputer systems. ...
verification capabilities. ...
doi:10.1109/isca.2014.6853227
dblp:conf/isca/WaddenLGSS14
fatcat:ccshcrfm5rembof45ujamghcoy
Analysis of GPGPU Programs for Data-race and Barrier Divergence
2018
Proceedings of the 13th International Conference on Software Technologies
We present a technique to identify the existence of these properties in a CUDA program using a static property verification method. ...
In this paper, we focus on the two important properties of the programs written for GPGPUs, namely i) the data-race conditions and ii) the barrier divergence. ...
ACKNOWLEDGEMENTS This work is financially supported by Science and Engineering Research Board, Govt. of India funding (SB/S3/EECE/0170/2014). ...
doi:10.5220/0006834904940505
dblp:conf/icsoft/SarkarKBG18
fatcat:5hr4xlpyrvg4tpx7eksp7liqsy
High performance gate-level simulation with GP-GPU computing
2011
Proceedings of 2011 International Symposium on VLSI Design, Automation and Test
Functional verification of modern digital designs is a mission critical and time-consuming task. ...
Recently, advances in graphic processing unit (GPU) technology has made GPUs emerge as a cost effective parallel processing solution. ...
A compilation step produces macro-gates, which are optimized and offloaded to the GPU for simulation. ...
doi:10.1109/vdat.2011.5783577
fatcat:myt3bsnwqnfhjim4q4po3idxui
« Previous
Showing results 1 — 15 out of 1,810 results