Filters








17,911 Hits in 5.1 sec

TOP-DOWN DESIGN OF BULK-SYNCHRONOUS PARALLEL PROGRAMS

YIFENG CHEN, J. W. SANDERS
2003 Parallel Processing Letters  
This paper studies top-down program development techniques for Bulk-Synchronous Parallelism.  ...  In that context a specification formalism Logs, for 'the Logic of Global Synchrony', has been proposed for the specification and high-level development of BSP designs.  ...  Introduction Top-down design of a program starts from an abstract formal specification.  ... 
doi:10.1142/s0129626403001367 fatcat:obnl5sbywrhytmyjuzyhyfwfcq

Implementing directed acyclic graphs with the heterogeneous system architecture

Sooraj Puthoor, Ashwin M. Aji, Shuai Che, Mayank Daga, Wei Wu, Bradford M. Beckmann, Gregory Rodgers
2016 Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16  
The use of DAGs to extract parallelism also enables runtimes to perform dynamic load-balancing, thereby achieving higher throughput when compared to the traditional bulk-synchronous execution.  ...  1.5x, respectively, compared to bulk-synchronous implementations.  ...  OpenCL is a trademark of Apple, Inc. used by permission by Khronos.  ... 
doi:10.1145/2884045.2884052 dblp:conf/ppopp/PuthoorACDWBR16 fatcat:mkekkvzr2zbhdfftehp6ztdcci

Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management

R. Poss, M. Lankamp, Q. Yang, J. Fu, M.W. van Tol, I. Uddin, C. Jesshope
2013 Microprocessors and microsystems  
Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads.  ...  The key aspects of the design are asynchrony, i.e. the ability to tolerate irregular long latencies on chip, a scale-invariant programming model, a distributed chip resource model, and the transparent  ...  The intent is to enable capturing these concurrency semantics in various programming models, e.g. the bulk-synchronous parallelism (BSP [31] ) and task parallelism constructs of OpenMP [7] and OpenCL  ... 
doi:10.1016/j.micpro.2013.05.004 fatcat:2eobsqpoabhubksy7asfmbdqmu

An approach to scalability study of shared memory parallel systems

Anand Sivasubramaniam, Aman Singla, Umakishore Ramachandran, H. Venkateswaran
1994 Performance Evaluation Review  
A top-down approach to scalability study of shared memory parallel systems is proposed in this research.  ...  The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines.  ...  We have illustrated the usefulness as well as the feasibility of our top-down approach for understanding the scalability of parallel systems.  ... 
doi:10.1145/183019.183038 fatcat:xy3jojm43vghblh3znxktrbaku

An approach to scalability study of shared memory parallel systems

Anand Sivasubramaniam, Aman Singla, Umakishore Ramachandran, H. Venkateswaran
1994 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems - SIGMETRICS '94  
A top-down approach to scalability study of shared memory parallel systems is proposed in this research.  ...  The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines.  ...  We have illustrated the usefulness as well as the feasibility of our top-down approach for understanding the scalability of parallel systems.  ... 
doi:10.1145/183018.183038 dblp:conf/sigmetrics/SivasubramaniamSRV94 fatcat:mniz5dyw4baszipyhi5muczp2i

Bulk file I/O extensions to Java

Dan Bonachea
2000 Proceedings of the ACM 2000 conference on Java Grande - JAVA '00  
The extensions were implemented in Titanium, a high-performance, parallel dialect of Java.  ...  The first adds bulk (array) I/O operations to the existing libraries, removing much of the overhead currently associated with array I/O.  ...  and debugging parallel programs.  ... 
doi:10.1145/337449.337459 dblp:conf/java/Bonachea00 fatcat:3a2uqcn7pjgmhpc7belbfabusm

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization [article]

Blake A. Hechtman, Andrew D. Hilton, Daniel J. Sorin
2016 arXiv   pre-print
We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms.  ...  We build upon work-first to create the "work-together" principle that addresses the specific strengths and weaknesses of GPUs.  ...  In TREES, computation is divided into massively parallel epochs that are synchronized in bulk. The sequence of epochs is the critical path of a taskparallel program in TREES.  ... 
arXiv:1608.00571v1 fatcat:du263cjnw5b5beq7rravgrpi54

Investigating Graph Algorithms in the BSP Model on the Cray XMT

David Ediger, David A. Bader
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
Alternative programming models, such as the bulk synchronous parallel programming model used in Google's Pregel, have been proposed for large graph analytics.  ...  These algorithms perform within a factor of 10 of hand-tuned C code.  ...  We will consider the bulk synchronous parallel (BSP) style of programming, rather than the BSP computation model or BSPlib [17] .  ... 
doi:10.1109/ipdpsw.2013.107 dblp:conf/ipps/EdigerB13 fatcat:hrsoeslsezbzfm7j6vsudcguku

A Simulation-Based Scalability Study of Parallel Systems

A. Sivasubramaniam, A. Singla, U. Ramachandran, H. Venkateswaran
1994 Journal of Parallel and Distributed Computing  
We propose a top-down approach to scalability study that alleviates some of these problems.  ...  We illustrate the top-down approach by considering a case study in implementing three NAS parallel kernels on two simulated message-passing platforms.  ...  Acknowledgements The authors would like to thank the anonymous referees for their comments, which helped us put the results of this work in the proper perspective in addition to improving the quality of  ... 
doi:10.1006/jpdc.1994.1101 fatcat:ucxbfc3ugbcgfgwlhm46cxgbfy

A Task-Centric Memory Model for Scalable Accelerator Architectures

John H. Kelm, Daniel R. Johnson, Steven S. Lumetta, Sanjay J. Patel, Matthew I. Frank
2010 IEEE Micro  
Focus Center Research Program, a Semiconductor Research Corporation Program.  ...  We thank the Trusted ILLIAC Center at the Information Trust Institute for their generous contribution of use of their computing cluster to help us complete our research.  ...  Parallelism structure The programming styles adopted by many developers for accelerator applications share a common structure, similar to bulk synchronous processing. 3 These large-scale parallel applications  ... 
doi:10.1109/mm.2010.6 fatcat:wzxeuc4yrvajbg7jv4jatdagya

An asymmetric distributed shared memory model for heterogeneous parallel systems

Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay Patel, Wen-mei W. Hwu
2010 Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10  
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment.  ...  Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications.  ...  The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program  ... 
doi:10.1145/1736020.1736059 dblp:conf/asplos/GeladoCNSPH10 fatcat:bu7vb6gcenhzhgo23appm64gie

An asymmetric distributed shared memory model for heterogeneous parallel systems

Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay Patel, Wen-mei W. Hwu
2010 SIGPLAN notices  
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment.  ...  Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications.  ...  The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program  ... 
doi:10.1145/1735971.1736059 fatcat:pq3l4cbqfzehxhuv6gyztqb5ui

An asymmetric distributed shared memory model for heterogeneous parallel systems

Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay Patel, Wen-mei W. Hwu
2010 SIGARCH Computer Architecture News  
We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment.  ...  Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications.  ...  The authors acknowledge the support of the Gigascale Systems Research Center, one of the five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program  ... 
doi:10.1145/1735970.1736059 fatcat:l5nbecrkrvabjbvbipumlct6m4

Evaluating the performance limitations of MPMD communication

Chi-Chao Chang, Grzegorz Czajkowski, Thorsten von Eicken, Carl Kesselman
1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97  
This problem has thus far limited the appeal of high-level programming languages based on MPMD models in the parallel computing community.  ...  This paper investigates the fundamental limitations of MPMD communication using a case study of two parallel programming languages, Compositional C++ (CC++) and Split-C, that provide support for a global  ...  CC++ is a parallel extension of C++ designed for the development of task-parallel object-oriented programs.  ... 
doi:10.1145/509593.509604 dblp:conf/sc/ChangCEK97 fatcat:ay2mqgjdhfhtfjmyi5yimifqna

A Scheduling and Runtime Framework for a Cluster of Heterogeneous Machines with Multiple Accelerators

Tarun Beri, Sorav Bansal, Subodh Kumar
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
Our programming model is based on a shared global address space, made efficient by transaction style bulk-synchronous semantics.  ...  We present a runtime system for simple and efficient programming of CPU+GPU clusters.  ...  This bulk-synchronous nature of GPUs is a fundamental design parameter in our framework.  ... 
doi:10.1109/ipdps.2015.12 dblp:conf/ipps/BeriBK15 fatcat:qmevn5ey45dwbhxa6td5whl3v4
« Previous Showing results 1 — 15 out of 17,911 results