370 Hits in 3.3 sec

A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture [chapter]

Fan Ye, Christophe Calvin, Serge G. Petiton
2015 Lecture Notes in Computer Science  
We parallelized for Intel MIC architecture a vectorized SpMV kernel using respectively the pure and the hybrid MPI/OpenMP models.  ...  It can help to promote the data locality and thread scalability. To further assess the performance, two indicators characterizing the nonzeros are proposed to model the performance.  ...  It also mitigates the memory contention as each process keeps a copy of x and the rows are distributed deliberately to each process (see Alg. 2).  ... 
doi:10.1007/978-3-319-17353-5_4 fatcat:ivba4yzbubcspejxagqkowwvl4

MPI+Threads: runtime contention and remedies

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes.  ...  We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems.  ...  Thus, we believe that combining those approaches will have a synergistic effect on reducing the runtime contention.  ... 
doi:10.1145/2688500.2688522 dblp:conf/ppopp/AmerLWBM15 fatcat:rnvdxwcx5vawdg7wdzrhv7zo4m

MPI+Threads: runtime contention and remedies

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka
2015 SIGPLAN notices  
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes.  ...  We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems.  ...  Thus, we believe that combining those approaches will have a synergistic effect on reducing the runtime contention.  ... 
doi:10.1145/2858788.2688522 fatcat:q2gw434oi5bv3i4e3lwowittsu


2007 Parallel Processing Letters  
As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements.  ...  The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.  ...  SMPs lack the word-level synchronization primitive provided by massively multithreaded machines, so scalability limits due to contention will be more substantial.  ... 
doi:10.1142/s0129626407002843 fatcat:samtlwojnjccfg7fhvzvjh6xmq

Multigrain parallel Delaunay Mesh generation

Christos D. Antonopoulos, Xiaoning Ding, Andrey Chernikov, Filip Blagojevic, Dimitrios S. Nikolopoulos, Nikos Chrisochoides
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
The exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on  ...  Our findings extend to other adaptive and irregular multigrain, parallel algorithms.  ...  We would like to thank Chaman Verma for his initial implementation of the medium-grain PCDM algorithm and the anonymous referees for their valuable comments.  ... 
doi:10.1145/1088149.1088198 dblp:conf/ics/AntonopoulosDCBNC05 fatcat:2rfdnb2w75ewfmuv7n2cyvfbmm

Measuring Multithreaded Message Matching Misery [chapter]

Whit Schonbein, Matthew G. F. Dosanjh, Ryan E. Grant, Patrick G. Bridges
2018 Lecture Notes in Computer Science  
While there has been significant developer interest and work to provide an efficient MPI interface for multithreaded access, there has not been a study showing how these patterns affect messaging patterns  ...  MPI usage patterns are changing as applications move towards fully-multithreaded runtimes. However, the impact of these patterns on MPI message matching is not well-studied.  ...  While some studies address improvements to MPI's multithreaded codepaths, few assess how multithreaded communication affects the behavior of MPI message processing.  ... 
doi:10.1007/978-3-319-96983-1_34 fatcat:voyagmk2cff23lsmbbx5rghvai

A Survey on Hardware and Software Support for Thread Level Parallelism [article]

Somnath Mazumdar, Roberto Giorgi
2016 arXiv   pre-print
Due to the heterogeneity in hardware, hybrid programming model (which combines the features of shared and distributed model) currently has become very promising.  ...  We also further discuss on software support for threads, to mainly increase the deterministic behavior during runtime.  ...  To increase overall chip throughput and fairness, thread-level schedulers such as symbiotic OS (SOS)-level job scheduler for SMT chips [ST00] try to mitigate shared-resource contention mainly in last level  ... 
arXiv:1603.09274v3 fatcat:75isdvgp5zbhplocook6273sq4

Fiuncho: a program for any-order epistasis detection in CPU clusters [article]

Christian Ponte-Fernández
2022 arXiv   pre-print
This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach.  ...  The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used.  ...  We would like to thank Supercomputación Castilla y León (SCAYLE), for providing us access to their computing resources.  ... 
arXiv:2201.03331v3 fatcat:n56wwqfk6nebvd24qv2ee3gfzy

Comparison and Analysis of Parallel Computing Performance Using OpenMP and MPI

Shen Hua
2013 Open Automation and Control Systems Journal  
The developments of multi-core technology have induced big challenges to software structures.  ...  To take full advantages of the performance enhancements offered by new multi-core hardware, software programming models have made a great shift from sequential programming to parallel programming.  ...  User can create their own data types by giving options to address combination.  ... 
doi:10.2174/1874444301305010038 fatcat:5vh2ksivljdhzay2xxzj4fonni

Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs

Thananon Patinyasakdikul, David Eberius, George Bosilca, Nathan Hjelm
2019 2019 IEEE International Conference on Cluster Computing (CLUSTER)  
MPI+threads emerged as one of the favorite choices in HPC community, according to a survey of the HPC community.  ...  However, threading support in MPI comes with many compromises to the overall performance delivered, and, therefore, its adoption is compromised.  ...  [17] - [19] proposed several strategies to minimize locking for MPI internals to mitigate the effect of lock contention, which becomes one of the main performance bottlenecks for multi-threaded MPI  ... 
doi:10.1109/cluster.2019.8891015 dblp:conf/cluster/Patinyasakdikul19 fatcat:5nt3k7u5ujgatbxzk25ysa4biy

An Evaluation of Threaded Models for a Classical MD Proxy Application

Pietro Cicotti, Susan M. Mniszewski, Laura Carrington
2014 2014 Hardware-Software Co-Design for High Performance Computing  
We explore the design of a multithreaded MD code to evaluate several tradeoffs that arise when converting an MPI application into a hybrid multithreaded application, to address the aforementioned constraints  ...  extent with a hybrid MPI+threads approach, though an explicit NUMA API to control locality may be desirable; and finally that dynamically scheduling the work within a process can mitigate the impact of  ...  ACKNOWLEDGMENT The authors would like to thank Jim Belak and David Richards DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government.  ... 
doi:10.1109/co-hpc.2014.6 dblp:conf/sc/CicottiMC14 fatcat:riskgm35ujfxzjnv65svdsdvky

Doing Scientific Machine Learning with Julia's SciML Ecosystem [article]

Christopher Rackauckas
The goal is to get those in the workshop familiar with what these methods are, what kinds of problems they solve, and know how to use Julia packages to implement them.The workshop will jump right into  ...  pii/S0021999118307125)), and Sparse Identification of Nonlinear Dynamics (SInDy, [Discovering governing equations from data by sparse identification of nonlinear dynamical systems](  ...  differentiate through the solver) Requires reversing the ODE or differentiate the solver Requires reversing the ODE Parallelism GPU, MPI, multithreading GPU, MPI, multithreading GPU  ... 
doi:10.6084/m9.figshare.12751949.v1 fatcat:3nodxm7ghzftflwbmlrtnhf5tu

Performance characterization and optimization of mobile augmented reality on handheld platforms

Sadagopan Srinivasan, Zhen Fang, Ravi Iyer, Steven Zhang, Mike Espig, Don Newell, Daniel Cermak, Yi Wu, Igor Kozintsev, Horst Haussecker
2009 2009 IEEE International Symposium on Workload Characterization (IISWC)  
We also present a detailed architectural characterization of the hotspot functions in terms of CPI, MPI, etc.  ...  In the MAR usage model, the user is able to point the handheld camera to an object (like a wine bottle) or a set of objects (like an outdoor scene of buildings or monuments) and the device automatically  ...  It should not be considered as benchmarking for such applications and was not intended to be specific to any processor or platform configuration. allowing us to use his images in this paper.  ... 
doi:10.1109/iiswc.2009.5306788 dblp:conf/iiswc/SrinivasanFIZENCWKH09 fatcat:jzhluenqf5hfnlgbkwqnjblp7y

Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors

Kamesh Madduri, Samuel Williams, Stéphane Ethier, Leonid Oliker, John Shalf, Erich Strohmaier, Katherine Yelicky
2009 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09  
In GTC, this step involves particles depositing charges to a 3D toroidal mesh, and multiple particles may contribute to the charge at a grid point.  ...  We find that our best strategies can be 2× faster than the reference optimized MPI implementation, and our analysis provides insight into desirable architectural features for high-performance PIC simulation  ...  Acknowledgments We would like to express our gratitude to Intel and Sun for their hardware donations.  ... 
doi:10.1145/1654059.1654108 dblp:conf/sc/MadduriWEOSSY09 fatcat:cwakndkumzg27ovrzpoycyyl24

Assessing improvements to the parallel volume rendering pipeline at large scale

Tom Peterka, Robert Ross, Hongfeng Yu, Kwan-Liu Ma, Wesley Kendall, Jian Huang
2008 2008 Workshop on Ultrascale Visualization  
To improve compositing, we experiment with a hybrid MPImultithread programming model, and to mitigate the high cost of I/O, we implement multiple parallel pipelines to partially hide the I/O cost when  ...  We take a systemwide view in analyzing the performance of software volume rendering on the IBM Blue Gene/P at over 10,000 cores by examining the relative costs of the I/O, rendering, and compositing portions  ...  DISCUSSION By improving load balancing, conserving memory through parallel output, combining MPI with multithreaded program- Fig. 13 .  ... 
doi:10.1109/ultravis.2008.5154059 fatcat:zffx4ptirberhnnqqs46bcavou
« Previous Showing results 1 — 15 out of 370 results