A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is application/pdf
.
Filters
TOP-DOWN DESIGN OF BULK-SYNCHRONOUS PARALLEL PROGRAMS
2003
Parallel Processing Letters
This paper studies top-down program development techniques for Bulk-Synchronous Parallelism. ...
In that context a specification formalism Logs, for 'the Logic of Global Synchrony', has been proposed for the specification and high-level development of BSP designs. ...
Introduction Top-down design of a program starts from an abstract formal specification. ...
doi:10.1142/s0129626403001367
fatcat:obnl5sbywrhytmyjuzyhyfwfcq
Implementing directed acyclic graphs with the heterogeneous system architecture
2016
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16
The use of DAGs to extract parallelism also enables runtimes to perform dynamic load-balancing, thereby achieving higher throughput when compared to the traditional bulk-synchronous execution. ...
1.5x, respectively, compared to bulk-synchronous implementations. ...
OpenCL is a trademark of Apple, Inc. used by permission by Khronos. ...
doi:10.1145/2884045.2884052
dblp:conf/ppopp/PuthoorACDWBR16
fatcat:mkekkvzr2zbhdfftehp6ztdcci
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management
2013
Microprocessors and microsystems
Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. ...
The key aspects of the design are asynchrony, i.e. the ability to tolerate irregular long latencies on chip, a scale-invariant programming model, a distributed chip resource model, and the transparent ...
The intent is to enable capturing these concurrency semantics in various programming models, e.g. the bulk-synchronous parallelism (BSP [31] ) and task parallelism constructs of OpenMP [7] and OpenCL ...
doi:10.1016/j.micpro.2013.05.004
fatcat:2eobsqpoabhubksy7asfmbdqmu
An approach to scalability study of shared memory parallel systems
1994
Performance Evaluation Review
A top-down approach to scalability study of shared memory parallel systems is proposed in this research. ...
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. ...
We have illustrated the usefulness as well as the feasibility of our top-down approach for understanding the scalability of parallel systems. ...
doi:10.1145/183019.183038
fatcat:xy3jojm43vghblh3znxktrbaku
An approach to scalability study of shared memory parallel systems
1994
Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems - SIGMETRICS '94
A top-down approach to scalability study of shared memory parallel systems is proposed in this research. ...
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. ...
We have illustrated the usefulness as well as the feasibility of our top-down approach for understanding the scalability of parallel systems. ...
doi:10.1145/183018.183038
dblp:conf/sigmetrics/SivasubramaniamSRV94
fatcat:mniz5dyw4baszipyhi5muczp2i
Bulk file I/O extensions to Java
2000
Proceedings of the ACM 2000 conference on Java Grande - JAVA '00
The extensions were implemented in Titanium, a high-performance, parallel dialect of Java. ...
The first adds bulk (array) I/O operations to the existing libraries, removing much of the overhead currently associated with array I/O. ...
and debugging parallel programs. ...
doi:10.1145/337449.337459
dblp:conf/java/Bonachea00
fatcat:3a2uqcn7pjgmhpc7belbfabusm
TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization
[article]
2016
arXiv
pre-print
We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. ...
We build upon work-first to create the "work-together" principle that addresses the specific strengths and weaknesses of GPUs. ...
In TREES, computation is divided into massively parallel epochs that are synchronized in bulk. The sequence of epochs is the critical path of a taskparallel program in TREES. ...
arXiv:1608.00571v1
fatcat:du263cjnw5b5beq7rravgrpi54
Investigating Graph Algorithms in the BSP Model on the Cray XMT
2013
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum
Alternative programming models, such as the bulk synchronous parallel programming model used in Google's Pregel, have been proposed for large graph analytics. ...
These algorithms perform within a factor of 10 of hand-tuned C code. ...
We will consider the bulk synchronous parallel (BSP) style of programming, rather than the BSP computation model or BSPlib [17] . ...
doi:10.1109/ipdpsw.2013.107
dblp:conf/ipps/EdigerB13
fatcat:hrsoeslsezbzfm7j6vsudcguku
A Simulation-Based Scalability Study of Parallel Systems
1994
Journal of Parallel and Distributed Computing
We propose a top-down approach to scalability study that alleviates some of these problems. ...
We illustrate the top-down approach by considering a case study in implementing three NAS parallel kernels on two simulated message-passing platforms. ...
Acknowledgements The authors would like to thank the anonymous referees for their comments, which helped us put the results of this work in the proper perspective in addition to improving the quality of ...
doi:10.1006/jpdc.1994.1101
fatcat:ucxbfc3ugbcgfgwlhm46cxgbfy
A Task-Centric Memory Model for Scalable Accelerator Architectures
2010
IEEE Micro
Focus Center Research Program, a Semiconductor Research Corporation Program. ...
We thank the Trusted ILLIAC Center at the Information Trust Institute for their generous contribution of use of their computing cluster to help us complete our research. ...
Parallelism structure The programming styles adopted by many developers for accelerator applications share a common structure, similar to bulk synchronous processing. 3 These large-scale parallel applications ...
doi:10.1109/mm.2010.6
fatcat:wzxeuc4yrvajbg7jv4jatdagya
An asymmetric distributed shared memory model for heterogeneous parallel systems
2010
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment. ...
Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. ...
The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program ...
doi:10.1145/1736020.1736059
dblp:conf/asplos/GeladoCNSPH10
fatcat:bu7vb6gcenhzhgo23appm64gie
An asymmetric distributed shared memory model for heterogeneous parallel systems
2010
SIGPLAN notices
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment. ...
Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. ...
The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program ...
doi:10.1145/1735971.1736059
fatcat:pq3l4cbqfzehxhuv6gyztqb5ui
An asymmetric distributed shared memory model for heterogeneous parallel systems
2010
SIGARCH Computer Architecture News
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment. ...
Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. ...
The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program ...
doi:10.1145/1735970.1736059
fatcat:l5nbecrkrvabjbvbipumlct6m4
Evaluating the performance limitations of MPMD communication
1997
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97
This problem has thus far limited the appeal of high-level programming languages based on MPMD models in the parallel computing community. ...
This paper investigates the fundamental limitations of MPMD communication using a case study of two parallel programming languages, Compositional C++ (CC++) and Split-C, that provide support for a global ...
CC++ is a parallel extension of C++ designed for the development of task-parallel object-oriented programs. ...
doi:10.1145/509593.509604
dblp:conf/sc/ChangCEK97
fatcat:ay2mqgjdhfhtfjmyi5yimifqna
A Scheduling and Runtime Framework for a Cluster of Heterogeneous Machines with Multiple Accelerators
2015
2015 IEEE International Parallel and Distributed Processing Symposium
Our programming model is based on a shared global address space, made efficient by transaction style bulk-synchronous semantics. ...
We present a runtime system for simple and efficient programming of CPU+GPU clusters. ...
This bulk-synchronous nature of GPUs is a fundamental design parameter in our framework. ...
doi:10.1109/ipdps.2015.12
dblp:conf/ipps/BeriBK15
fatcat:qmevn5ey45dwbhxa6td5whl3v4
« Previous
Showing results 1 — 15 out of 17,911 results