2,766 Hits in 4.6 sec

Language and Compiler Support for Hybrid-Parallel Programming on SMP Clusters [chapter]

Siegfried Benkner, Viera Sipkova
2002 Lecture Notes in Computer Science  
based on OpenMP within nodes and distributed-memory parallelism utilizing MPI across nodes.  ...  Based on the proposed language extensions, the VFC compiler adopts a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by exploiting shared-memory parallelism  ...  HPF Extensions for SMP Clusters HPF has been primarily designed for distributed-memory machines, but can also be employed on shared-memory machines and on clusters.  ... 
doi:10.1007/3-540-47847-7_4 fatcat:fmg2qm2xcrdexgmrrxwhitswla

Parallel Java: A Unified API for Shared Memory and Cluster Parallel Programming in 100% Java

Alan Kaminsky
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
Parallel Java is a parallel programming API whose goals are (1) to support both shared memory (thread-based) parallel programming and cluster (message-based) parallel programming in a single unified API  ...  , allowing one to write parallel programs combining both paradigms; (2) to provide the same capabilities as OpenMP and MPI in an object oriented, 100% Java API; and (3) to be easily deployed and run in  ...  Using the same PJ API one can write parallel programs in Java for SMP machines, clusters, and hybrid SMP clusters.  ... 
doi:10.1109/ipdps.2007.370421 dblp:conf/ipps/Kaminsky07 fatcat:zimhtmvjirc6bmibmoo3nwxuci

Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters

Ramzi Basharahil, Brian Wims, Cheng-Zhong Xu, Song Fu
2005 Journal of Supercomputing  
We demonstrate the programmability of the model in a number of parallel applications and evaluate its performance on a cluster of SMP servers, in particular, the impact of the coherence granularity.  ...  This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs).  ...  A preliminary version of this paper appeared in the Proceedings of the 11th International Conference on Parallel and Distributed Computing and Systems [36] .  ... 
doi:10.1007/s11227-005-0041-5 fatcat:cqojcp4ua5fg5eicdyi5ol7nb4

Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Jun Doi, Yasushi Negishi
2010 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis  
Although the parallel FFT algorithm is usually implemented using synchronous all-to-all communication, we integrated the A parallel FFT algorithm into the pipelined all-to-all communication using SMP threads  ...  For example a standard rack of Blue Gene®/P [3] machines is composed of a 3D torus whose dimensions are 8x8x16.  ...  We would like to thank Fred Mintzer for helping us obtain machine time, James Sexton for managing our research project, and Shannon Jacobs for proofreading.  ... 
doi:10.1109/sc.2010.38 dblp:conf/sc/DoiN10 fatcat:b45rgyb3sbbypd3bv4alyyfoem

Parallel classification for data mining on shared-memory multiprocessors

M.J. Zaki, Ching-Tien Ho, R. Agrawal
1999 Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)  
This performance evaluation shows that the construction of a decision-tree classi er can be e ectively parallelized on an SMP machine with good speedup.  ...  We e v aluate the performance of these algorithms on two machine con gurations: one in which data is too large to t in memory and must be paged from a local disk as needed and the other in which memory  ...  SMP machines are the dominant t ypes of parallel machines currently used in industry. Individual nodes of even parallel distributed-memory machines are increasingly being designed to be SMP nodes.  ... 
doi:10.1109/icde.1999.754925 dblp:conf/icde/ZakiHA99 fatcat:4ydp6hzlazhnhkainlxtfq22xe

On using ZENTURIO for performance and parameter studies on cluster and Grid architectures

R. Prodan, T. Fahringer, F. Franz
2003 Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings.  
Experiments have been conducted on an SMP cluster with Fast Ethernet and Myrinet communication networks, using PBS (Portable Batch System) and GRAM (Globus Resource Allocation Manager) as job managers.  ...  a financial modeling application.  ...  Intra-node parallelisation is achieved through OpenMP directives. Communication among SMP nodes is realised through MPI calls. We scheduled the experiments on the SMP cluster using GRAM.  ... 
doi:10.1109/empdp.2003.1183587 dblp:conf/pdp/ProdanFGMFM03 fatcat:ucnuzixncbgyblcamordgana7a

Stack splitting: A technique for efficient exploitation of search parallelism on share-nothing platforms

Enrico Pontelli, Karen Villaverde, Hai-Feng Guo, Gopal Gupta
2006 Journal of Parallel and Distributed Computing  
In this paper we present a distributed implementation of or-parallelism based on stack-splitting including results.  ...  from AI systems on distributed-memory machines.  ...  multiprocessor machines, by naturally incorporating efficient mechanisms for scheduling work and reducing communication overhead [25] .  ... 
doi:10.1016/j.jpdc.2006.05.002 fatcat:w7csuj7jrzh5tnrn66egb3uzvi

Design and analysis of the Alliance/University of New Mexico Roadrunner Linux SMP SuperCluster

D.A. Bader, A.B. Maccabe, J.R. Mastaler, J.K. McIver, P.A. Kovatch
1999 ICWC 99. IEEE Computer Society International Workshop on Cluster Computing  
For example, its operating system (Linux), job scheduler (PBS), compilers (GNU/EGCS), and parallel programming libraries (MPI).  ...  National Science Foundation (NSF) and the National Computational Science Alliance (NCSA), is based almost entirely on freelyavailable, vendor-independent software.  ...  Parallel Job Scheduling It is also very helpful in a clustered environment to deploy some type of parallel job scheduling software.  ... 
doi:10.1109/iwcc.1999.810804 dblp:conf/iwcc/BaderMMMK99 fatcat:dqb6wty4xvaafm4t3fixk676lq

Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

Yanhua Sun, Gengbin Zheng, Chao Mei, Eric J. Bohm, James C. Phillips, Laximant V. Kale, Terry R. Jones
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
Achieving good scaling for fine-grained communication intensive applications on modern supercomputers remains challenging.  ...  Based on the analysis, we optimize the runtime, built on the uGNI library for Gemini. We present several techniques to improve the fine-grained communication.  ...  In non-SMP mode, in contrast, each process embodies only one control flow, which handles both event scheduling and network communication.  ... 
doi:10.1109/sc.2012.87 dblp:conf/sc/SunZMBPKJ12 fatcat:qvaoddxxujglpnhnu7jyzunrmm

Fast PGAS Implementation of Distributed Graph Algorithms

Guojing Cong, George Almasi, Vijay Saraswat
2010 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis  
By improving memory access locality, compared with the naive implementation, our implementation exhibits much better communication efficiency and cache performance on a cluster of SMPs.  ...  With additional algorithmic and PGASspecific optimizations, our implementation achieves significant speedups over both the best sequential implementation and the best single-node SMP implementation for  ...  Moreover, deep recursion demands efficient scheduling of distributed, dynamic activities on a cluster of SMPs. CILK ( [9] ) provides dynamic activity scheduling on a single node.  ... 
doi:10.1109/sc.2010.26 dblp:conf/sc/CongAS10 fatcat:nlj3t2m7xnfipldhjcfvkotnle

Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

Terry Jones, Paul Tomlinson, Mark Roberts, Shawn Dawson, Rob Neely, William Tuel, Larry Brenner, Jeffrey Fier, Robert Blackmore, Patrick Caffrey, Brian Maskell
2003 Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03  
A parallel application benefits from scheduling policies that include a global perspective of the application's process working set.  ...  Our results indicate a speedup of over 300% on synchronizing collectives.  ...  a large parallel job running on a network of SMP nodes in a super-computing center.  ... 
doi:10.1145/1048935.1050161 dblp:conf/sc/JonesDNTBFBCMTR03 fatcat:utaonbdorveyrjgz4h2rvitcd4

Parallel Dispatch Queue: a queue-based programming abstraction to parallelize fine-grain communication protocols

B. Falsafi, D.A. Wood
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
On a cluster of 4 16-way SMPs, a PDQ-based parallel protocol running on idle SMP processors improves application performance by a factor of 2.6 over a system running a serial protocol on a single dedicated  ...  This paper proposes a novel queue-based programming abstraction, Parallel Dispatch Queue (PDQ), that enables efficient parallel execution of fine-grain software communication protocols.  ...  A Hurricane-1 device allows both dedicated and multiplexed protocol scheduling on SMP processors [6] .  ... 
doi:10.1109/hpca.1999.744362 dblp:conf/hpca/FalsafiW99 fatcat:hrkgiidfmzc43m4jz2qttnrdpm

Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs

E. Koukis, N. Koziris
2006 12th International Conference on Parallel and Distributed Systems - (ICPADS'06)  
Then, we present the design and implementation of an informed gang-like scheduling algorithm aimed at improving the throughput of multiprogrammed workloads on clusters of SMPs.  ...  Its input data are acquired dynamically using hardware monitoring counters and a modified Myrinet NIC firmware, without any modifications to existing application binaries.  ...  on a cluster of SMPs.  ... 
doi:10.1109/icpads.2006.59 dblp:conf/icpads/KoukisK06 fatcat:bgjhk7dsaraqjbaocycgyakheu

Parallel programming with message passing and directives

S.W. Bova, C.P. Breshears, H. Gabb, B. Kuhn, B. Magro, R. Eigenmann, G. Gaertner, S. Salvini, H. Scott
2001 Computing in science & engineering (Print)  
These applications have three primary goals: • high speedup, scalable performance, and efficient system use; • similar behavior on a wide range of platforms and easy portability between platforms; and  ...  P arallel application developers today face the problem of how to integrate the dominant parallel processing models into one source code.  ...  Introducing communication within the extent of dynamic load balancing would also permit overlapping communication and computation within each SMP node, reducing communication costs by up to a factor N  ... 
doi:10.1109/5992.947105 fatcat:of7laitsjnhz7exipyqhf7ehtq

Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Chao Mei, Yanhua Sun, Gengbin Zheng, Eric J. Bohm, Laxmikant V. Kale, James C. Phillips, Chris Harrison
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine.  ...  We exploit node-aware techniques to optimize both the application and the underlying SMP runtime.  ...  Acknowledgments This work was supported in part by a NIH Grant PHS 5 P41 RR05969-04 for Molecular Dynamics, by NSF grant OCI-0725070 for Blue Waters deployment, by the Institute for Advanced Computing  ... 
doi:10.1145/2063384.2063466 dblp:conf/sc/MeiSZBKPH11 fatcat:jgkzpijiljgl3hxfhl2vevglyu
« Previous Showing results 1 — 15 out of 2,766 results