A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture
[chapter]
2015
Lecture Notes in Computer Science
We parallelized for Intel MIC architecture a vectorized SpMV kernel using respectively the pure and the hybrid MPI/OpenMP models. ...
It can help to promote the data locality and thread scalability. To further assess the performance, two indicators characterizing the nonzeros are proposed to model the performance. ...
It also mitigates the memory contention as each process keeps a copy of x and the rows are distributed deliberately to each process (see Alg. 2). ...
doi:10.1007/978-3-319-17353-5_4
fatcat:ivba4yzbubcspejxagqkowwvl4
MPI+Threads: runtime contention and remedies
2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes. ...
We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems. ...
Thus, we believe that combining those approaches will have a synergistic effect on reducing the runtime contention. ...
doi:10.1145/2688500.2688522
dblp:conf/ppopp/AmerLWBM15
fatcat:rnvdxwcx5vawdg7wdzrhv7zo4m
MPI+Threads: runtime contention and remedies
2015
SIGPLAN notices
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes. ...
We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems. ...
Thus, we believe that combining those approaches will have a synergistic effect on reducing the runtime contention. ...
doi:10.1145/2858788.2688522
fatcat:q2gw434oi5bv3i4e3lwowittsu
CHALLENGES IN PARALLEL GRAPH PROCESSING
2007
Parallel Processing Letters
As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. ...
The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems. ...
SMPs lack the word-level synchronization primitive provided by massively multithreaded machines, so scalability limits due to contention will be more substantial. ...
doi:10.1142/s0129626407002843
fatcat:samtlwojnjccfg7fhvzvjh6xmq
Multigrain parallel Delaunay Mesh generation
2005
Proceedings of the 19th annual international conference on Supercomputing - ICS '05
The exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on ...
Our findings extend to other adaptive and irregular multigrain, parallel algorithms. ...
We would like to thank Chaman Verma for his initial implementation of the medium-grain PCDM algorithm and the anonymous referees for their valuable comments. ...
doi:10.1145/1088149.1088198
dblp:conf/ics/AntonopoulosDCBNC05
fatcat:2rfdnb2w75ewfmuv7n2cyvfbmm
Measuring Multithreaded Message Matching Misery
[chapter]
2018
Lecture Notes in Computer Science
While there has been significant developer interest and work to provide an efficient MPI interface for multithreaded access, there has not been a study showing how these patterns affect messaging patterns ...
MPI usage patterns are changing as applications move towards fully-multithreaded runtimes. However, the impact of these patterns on MPI message matching is not well-studied. ...
While some studies address improvements to MPI's multithreaded codepaths, few assess how multithreaded communication affects the behavior of MPI message processing. ...
doi:10.1007/978-3-319-96983-1_34
fatcat:voyagmk2cff23lsmbbx5rghvai
A Survey on Hardware and Software Support for Thread Level Parallelism
[article]
2016
arXiv
pre-print
Due to the heterogeneity in hardware, hybrid programming model (which combines the features of shared and distributed model) currently has become very promising. ...
We also further discuss on software support for threads, to mainly increase the deterministic behavior during runtime. ...
To increase overall chip throughput and fairness, thread-level schedulers such as symbiotic OS (SOS)-level job scheduler for SMT chips [ST00] try to mitigate shared-resource contention mainly in last level ...
arXiv:1603.09274v3
fatcat:75isdvgp5zbhplocook6273sq4
Fiuncho: a program for any-order epistasis detection in CPU clusters
[article]
2022
arXiv
pre-print
This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach. ...
The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used. ...
We would like to thank Supercomputación Castilla y León (SCAYLE), for providing us access to their computing resources. ...
arXiv:2201.03331v3
fatcat:n56wwqfk6nebvd24qv2ee3gfzy
Comparison and Analysis of Parallel Computing Performance Using OpenMP and MPI
2013
Open Automation and Control Systems Journal
The developments of multi-core technology have induced big challenges to software structures. ...
To take full advantages of the performance enhancements offered by new multi-core hardware, software programming models have made a great shift from sequential programming to parallel programming. ...
User can create their own data types by giving options to address combination. ...
doi:10.2174/1874444301305010038
fatcat:5vh2ksivljdhzay2xxzj4fonni
Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs
2019
2019 IEEE International Conference on Cluster Computing (CLUSTER)
MPI+threads emerged as one of the favorite choices in HPC community, according to a survey of the HPC community. ...
However, threading support in MPI comes with many compromises to the overall performance delivered, and, therefore, its adoption is compromised. ...
[17] - [19] proposed several strategies to minimize locking for MPI internals to mitigate the effect of lock contention, which becomes one of the main performance bottlenecks for multi-threaded MPI ...
doi:10.1109/cluster.2019.8891015
dblp:conf/cluster/Patinyasakdikul19
fatcat:5nt3k7u5ujgatbxzk25ysa4biy
An Evaluation of Threaded Models for a Classical MD Proxy Application
2014
2014 Hardware-Software Co-Design for High Performance Computing
We explore the design of a multithreaded MD code to evaluate several tradeoffs that arise when converting an MPI application into a hybrid multithreaded application, to address the aforementioned constraints ...
extent with a hybrid MPI+threads approach, though an explicit NUMA API to control locality may be desirable; and finally that dynamically scheduling the work within a process can mitigate the impact of ...
ACKNOWLEDGMENT The authors would like to thank Jim Belak and David Richards
DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. ...
doi:10.1109/co-hpc.2014.6
dblp:conf/sc/CicottiMC14
fatcat:riskgm35ujfxzjnv65svdsdvky
Doing Scientific Machine Learning with Julia's SciML Ecosystem
[article]
2020
figshare.com
The goal is to get those in the workshop familiar with what these methods are, what kinds of problems they solve, and know how to use Julia packages to implement them.The workshop will jump right into ...
pii/S0021999118307125)), and Sparse Identification of Nonlinear Dynamics (SInDy, [Discovering governing equations from data by sparse identification of nonlinear dynamical systems](https://www.pnas.org/content ...
differentiate
through the
solver)
Requires reversing
the ODE or
differentiate the
solver
Requires reversing
the ODE
Parallelism
GPU, MPI, multithreading
GPU, MPI,
multithreading
GPU ...
doi:10.6084/m9.figshare.12751949.v1
fatcat:3nodxm7ghzftflwbmlrtnhf5tu
Performance characterization and optimization of mobile augmented reality on handheld platforms
2009
2009 IEEE International Symposium on Workload Characterization (IISWC)
We also present a detailed architectural characterization of the hotspot functions in terms of CPI, MPI, etc. ...
In the MAR usage model, the user is able to point the handheld camera to an object (like a wine bottle) or a set of objects (like an outdoor scene of buildings or monuments) and the device automatically ...
It should not be considered as benchmarking for such applications and was not intended to be specific to any processor or platform configuration. allowing us to use his images in this paper. ...
doi:10.1109/iiswc.2009.5306788
dblp:conf/iiswc/SrinivasanFIZENCWKH09
fatcat:jzhluenqf5hfnlgbkwqnjblp7y
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
2009
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
In GTC, this step involves particles depositing charges to a 3D toroidal mesh, and multiple particles may contribute to the charge at a grid point. ...
We find that our best strategies can be 2× faster than the reference optimized MPI implementation, and our analysis provides insight into desirable architectural features for high-performance PIC simulation ...
Acknowledgments We would like to express our gratitude to Intel and Sun for their hardware donations. ...
doi:10.1145/1654059.1654108
dblp:conf/sc/MadduriWEOSSY09
fatcat:cwakndkumzg27ovrzpoycyyl24
Assessing improvements to the parallel volume rendering pipeline at large scale
2008
2008 Workshop on Ultrascale Visualization
To improve compositing, we experiment with a hybrid MPImultithread programming model, and to mitigate the high cost of I/O, we implement multiple parallel pipelines to partially hide the I/O cost when ...
We take a systemwide view in analyzing the performance of software volume rendering on the IBM Blue Gene/P at over 10,000 cores by examining the relative costs of the I/O, rendering, and compositing portions ...
DISCUSSION By improving load balancing, conserving memory through parallel output, combining MPI with multithreaded program- Fig. 13 . ...
doi:10.1109/ultravis.2008.5154059
fatcat:zffx4ptirberhnnqqs46bcavou
« Previous
Showing results 1 — 15 out of 370 results