22,904 Hits in 5.7 sec

A study of memory-aware scheduling in message driven parallel programs

Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale
2010 2010 International Conference on High Performance Computing  
This paper presents a simple, but powerful memory-aware scheduling mechanism that adaptively schedules tasks in a message driven parallel program.  ...  In the LU program, only a single additional line of code is required to make use of the new general-purpose memory-aware scheduling mechanism.  ...  A message driven style of programming such as Charm++ [8] allows this pattern of computation to be expressed naturally.  ... 
doi:10.1109/hipc.2010.5713177 dblp:conf/hipc/DooleyMLK10 fatcat:dudq3y5bjfbbbfmhwk7hgc4mra

Accelerating Communication for Parallel Programming Models on GPU Systems [article]

Jaemin Choi, Zane Fink, Sam White, Nitin Bhat, David F. Richards, Laxmikant V. Kale
2022 arXiv   pre-print
In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem  ...  For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little  ...  This work was performed under the auspices of the U.S. Department of Energy (DOE) by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-JRNL-826064).  ... 
arXiv:2102.12416v4 fatcat:p7f5qhkjdfbhdk64ylhv65s2ve

A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++ [article]

Abhishek Kulkarni, Andrew Lumsdaine
2019 arXiv   pre-print
Rather than picking a winner out of these four models under consideration, we end with a discussion on lessons learned, and how such a study is instructive in the evolution of parallel programming frameworks  ...  The comparison study includes a survey of each runtime system's programming models, their corresponding execution models, their stated features, and performance and productivity goals.  ...  In a Charm++ program, a PE is a unit of mapping and scheduling: each PE has a scheduler with an associated pool of messages.  ... 
arXiv:1904.00518v1 fatcat:euvfhakryzcbdmhpbrrxrfu6he

Multiparadigm, multilingual interoperability: Experience with converse [chapter]

L. V. Kalé, Milind Bhandarkar, Robert Brunner, Joshua Yelon
1998 Lecture Notes in Computer Science  
The Converse run-time framework was designed with dual objectives: that of supporting quick development of portable run-time systems for new parallel programming paradigms, and that of permitting interoperability  ...  b e t ween multi-paradigm modules in a single application.  ...  Message Driven Perl mdPerl is a package for Perl 5 programs that allows writing message-driven parallel programs in Perl.  ... 
doi:10.1007/3-540-64359-1_682 fatcat:abvvt5pvqnfw7i4kgggroxxmru

Design and Implementation of a Contention-Aware Coscheduling Strategy on Multi-Programmed Heterogeneous Clusters

Jung-Lok YU, Hee-Jung BYUN
2011 IEICE transactions on information and systems  
Coscheduling has been gained a resurgence of interest as an effective technique to enhance the performance of parallel applications in multi-programmed clusters.  ...  To address this problem, in our previous study, we devised a novel algorithm that reorders the scheduling sequence of conflicting processes based on the rescheduling latency of their correspondents in  ...  Workload Characterizations We use NAS Parallel Benchmarks (NPB, version 2.4) [21] to evaluate the performance of all scheduling schemes in this study.  ... 
doi:10.1587/transinf.e94.d.2309 fatcat:oamhmabwefdcpa7nttwxnk52v4

A parallel-object programming model for petaflops machines and blue gene/cyclops

Gengbin Zheng, A. Kumar, S. Joshua, M. Unger, L.V. Kale
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
Such a machine might consist of a million processors, and is characterized by a low memory-to-processor ratio.  ...  Here we present the implementation of a parallel object model based on Charm++ as a candidate programming model.  ...  We also thank our group members of the Parallel Programming Laboratory at University of Illinois Urbana-Champaign, especially Orion Lawlor. We are grateful to the Blue Gene team at IBM, including Dr.  ... 
doi:10.1109/ipdps.2002.1016577 dblp:conf/ipps/ZhengSUK02 fatcat:dpr6b37uq5hsjdgotoouotkhqq

A Taxonomy Of Task-Based Parallel Programming Technologies For High-Performance Computing

Peter Thoman, Kiril Dichev, Khalid Hasanov, Roman Iakymchuk, Xavier Aguilar, Thomas Heller, Philipp Gschwandtner, Pierre Lemarinier, Stefano Markidis, Herbert Jordan, Thomas Fahringer, Kostas Katrinis (+2 others)
2017 Zenodo  
However, with the increase in parallel, many-core and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime  ...  Task-based programming models for shared memory -- such as Cilk Plus and OpenMP 3 -- are well established and documented.  ...  of a single task-parallel program.  ... 
doi:10.5281/zenodo.1119094 fatcat:kbuhio5hu5bs7kqkuj5s4jijdi

A Survey on Hardware and Software Support for Thread Level Parallelism [article]

Somnath Mazumdar, Roberto Giorgi
2016 arXiv   pre-print
The nature of current applications is diverse. To increase the system performance, all programming models may not be suitable to harness the built-in massive parallelism of multicore processors.  ...  We also review the programming models with respect to their support to shared-memory, distributed-memory and heterogeneity.  ...  In practice, contention-aware scheduling [ZBF10; KBH + 08] takes care of memory page and their migration.  ... 
arXiv:1603.09274v3 fatcat:75isdvgp5zbhplocook6273sq4

The Cost and Benefits of Coordination Programming: Two Case Studies in Concurrent Collections and S-NET

Pavel Zaichenkov, Olga Tveretina, Alex Shafarenko, Bert Gijsbers, Clemens Grelck
2016 Parallel Processing Letters  
This is an evaluation study of the expressiveness provided and the performance delivered by the coordination language S-Net in comparison to Intel's Concurrent Collections (CnC).  ...  Our case study is based on two applications: a face detection algorithm implemented as a pipeline of feature classifiers and a numerical algorithm from the linear algebra domain, namely Cholesky decomposition  ...  The Cost and Benefits of Coordination Programming 23 Tuning is a feature of CnC that is clearly separated from application design.  ... 
doi:10.1142/s0129626416500110 fatcat:spubspagovhttm57xaufztivp4

Improving Scalability with GPU-Aware Asynchronous Tasks [article]

Jaemin Choi, David F. Richards, Laxmikant V. Kale
2022 arXiv   pre-print
In this work, we integrate GPU-aware communication into asynchronous tasks in addition to computation-communication overlap, with the goal of reducing time spent in communication and further increasing  ...  We demonstrate the performance impact of our approach using a proxy application that performs the Jacobi iterative method, Jacobi3D.  ...  Incoming messages are accumulated in a message queue that is continuously checked by a scheduler that runs on each PE.  ... 
arXiv:2202.11819v2 fatcat:rxpoa2mpyvcqzgpkql25fpgtp4

Parallel Programming with Migratable Objects: Charm++ in Practice

Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, Laxmikant Kale
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
The advent of petascale computing has introduced new challenges (e.g. heterogeneity, system failure) for programming scalable parallel applications.  ...  Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology  ...  Also, some experiments were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several universities as well as other organizations  ... 
doi:10.1109/sc.2014.58 dblp:conf/sc/AcunGJLMMNRSTWK14 fatcat:tmksxaixivdmbhk2h7lo6tivdu

Debugging Large Scale Applications in a Virtualized Environment [chapter]

Filippo Gioachin, Gengbin Zheng, Laxmikant V. Kalé
2011 Lecture Notes in Computer Science  
We describe the obstacles we overcame to achieve this goal within two message passing programming models: CHARM++ and MPI.  ...  With the advent of petascale machines with hundreds of thousands of processors, debugging parallel applications is becoming an increasing challenge.  ...  ACKNOWLEDGMENTS This work was supported in part by the NSF Grant OCI-0725070 for Blue Waters, and by the Institute for Advanced Computing Applications and Technologies (IACAT).  ... 
doi:10.1007/978-3-642-19595-2_14 fatcat:p6gwfbomkrhtpksop6pyy65w7i

Making pull-based graph processing performant

Samuel Grossman, Heiner Litz, Christos Kozyrakis
2018 Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '18  
Our first contribution is a scheduler-aware interface for parallel loops that allows us to optimize for the common case in which each thread executes several consecutive iterations.  ...  This work focuses on inner loop parallelization for pull engines, which when performed naively leads to a significant increase in conflicting memory writes that must be synchronized.  ...  Acknowledgements We thank our anonymous reviewers and our shepherd, Michelle Goodstein, for their feedback and assistance in improving our paper.  ... 
doi:10.1145/3178487.3178506 dblp:conf/ppopp/GrossmanLK18 fatcat:up3zsecpynhzjmum53l4kt3ft4

Programming MPSoC platforms: Road works ahead!

R. Leupers, A. Vajda, M. Bekooij, Soonhoi Ha, R. Domer, A. Nohl
2009 2009 Design, Automation & Test in Europe Conference & Exhibition  
Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel  ...  The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture.  ...  Two major models for general purpose parallel programming are MPI and OpenMP: MPI is designed for distributed memory systems with explicit message passing paradigm of programming while OpenMP is designed  ... 
doi:10.1109/date.2009.5090917 fatcat:dz4ubgggofc3dnfqlnyknucgsa

Real-Time Scheduling Strategy for Wireless Sensor Networks O.S

Kayvan Atefi, Mohammad Sadeghi, Arash Atefi
2011 International Journal of Distributed and Parallel systems  
Most of the tasks in wireless sensor networks (WSN) are requested to run in a real-time way. Neither EDF nor FIFO can ensure real-time scheduling in WSN.  ...  During this paper the researchers will discuss about message scheduling and scheduling strategy for sensor network O.S, for this reason the researchers analysed system architecture of TinyOS, FIFO scheduling  ...  Resource Management Resources available in a typical sensor node are processor, program memory, battery, and etc. Efficient use of processor involves using a scheduler with optimal scheduling policy.  ... 
doi:10.5121/ijdps.2011.2606 fatcat:gd4fb6ujfbhy7evviw7vqmrkcu
« Previous Showing results 1 — 15 out of 22,904 results