Filters








3,487 Hits in 5.9 sec

Characterizing and Understanding PDES Behavior on Tilera Architecture

Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, Nael Abu-Ghazaleh
2012 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation  
The emergence of manycore architectures with shifting balance between computation and communication overhead can have a tremendous impact on performance and scalability of fine-grained parallel applications  ...  Finally, we explore the issues of object placement and model partitioning on Tilera architecture.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air  ... 
doi:10.1109/pads.2012.10 dblp:conf/pads/JagtapBPA12 fatcat:nfrxqav7cbcctcrhclhhbxiu2y

Navigating an Evolutionary Fast Path to Exascale

R.F. Barrett, S.D. Hammond, C.T. Vaughan, D.W. Doerfler, M.A. Heroux, J.P. Luitjens, D. Roweth
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
The computing community is in the midst of a disruptive architectural change.  ...  Therefore, as architectures, programming models, and programming mechanisms continue to evolve, the preparations described herein will provide significant performance benefits on existing and emerging  ...  ACKNOWLEDGEMENTS The breadth of our work has required special efforts from a variety of entities and staff within the Department of Energy and with our industrial collaborators.  ... 
doi:10.1109/sc.companion.2012.55 dblp:conf/sc/BarrettHVDHLR12 fatcat:3frq3n526vccbmfcpkoneb4edu

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

Samuel Williams, Leonid Oliker, Jonathan Carter, John Shalf
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
Finally, we evaluate the impact of our hierarchical tuning techniques using a variety of problem sizes via large-scale simulations on state-of-the-art Cray XT4, Cray XE6, and IBM BlueGene/P platforms.  ...  Next, we present a variety of parallel optimization approaches including programming model exploration (flat MPI, MPI/OpenMP, and MPI/Pthreads), as well as data and thread decomposition strategies designed  ...  Finally, to illustrate the impact of communication on performance, and provide apples-to-apples comparisons between architectures, we explore 3 progressively larger datasets: 1, 4, and 16GB per node.  ... 
doi:10.1145/2063384.2063458 dblp:conf/sc/WilliamsOCS11 fatcat:e7snov63dvfklehytpcqbfb7xi

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI [article]

James A. Ross, David A. Richie, Song J. Park, Dale R. Shires
2015 arXiv   pre-print
the importance of fast inter-core communication for the architecture.  ...  The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality.  ...  ACKNOWLEDGMENTS The authors wish to acknowledge the U.S. Army Research Laboratory-hosted Department of Defense Supercomputing Resource Center for its support of this work.  ... 
arXiv:1506.05442v1 fatcat:edidr7vxd5cglgeaieprywbdgm

On the Efficiency of Executing Hydro-environmental Models on Cloud

Fearghal O'Donncha, Emanuele Ragnoli, Srikumar Venugopal, Scott C. James, Kostas Katrinis
2016 Procedia Engineering  
Many-core capability is provided by the OpenMP library in a hybrid configuration with MPI for cross-node data movement, and we explore the combination of these in the target setup.  ...  For the MPI part, the work flow is implemented as a data-parallel execution model, with all processing elements performing the same computation, on different subdomains with thread-level, fine-grain parallelism  ...  Many-core capability is provided by the MPI and OpenMP libraries in a hybrid configuration, and we explore the combination of these in the target setup.  ... 
doi:10.1016/j.proeng.2016.07.447 fatcat:hz6mhyojovcq5cgjxc3umk5woa

Exploring power behaviors and trade-offs of in-situ data analytics

Marc Gamell, Patrick McCormick, Scott Pakin, Valerio Pascucci, Scott Klasky, Ivan Rodero, Manish Parashar, Janine C. Bennett, Hemanth Kolla, Jacqueline Chen, Peer-Timo Bremer, Aaditya G. Landge (+1 others)
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
model based on system power and data exchange patterns, which is empirically validated; and (3) the use of the model to characterize the energy behavior of the workflow and to explore energy/performance  ...  The goal of this paper is exploring data-related energy/performance trade-offs for end-to-end simulation workflows running at scale on current high-end computing systems.  ...  Acknowledgments The research presented in this work is supported in part by the Director, Office of Advanced  ... 
doi:10.1145/2503210.2503303 dblp:conf/sc/GamellRPBKCBLGMPPK13 fatcat:fdt5gmyd6vby7cali23vpm6hb4

Benchmarking a Many-Core Neuromorphic Platform With an MPI-Based DNA Sequence Matching Algorithm

Gianvito Urgese, Francesco Barchi, Emanuele Parisi, Evelina Forno, Andrea Acquaviva, Enrico Macii
2019 Electronics  
across the many cores of the platform.  ...  Experimental results indicate that the SpiNNaker parallel architecture allows a linear performance increase with the number of used cores and shows better scalability compared to a general-purpose multi-core  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/electronics8111342 fatcat:yjsmlxwqtrh2pht53mcz3wux2e

MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics

Amith R. Mamidala, Rahul Kumar, Debraj De, D. K. Panda
2008 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)  
Conclusions and Future Work Optimizing MPI collective communication on emerging multicore clusters is the key to obtaining good performance speed-ups for many parallel applications.  ...  Understanding the impact of these architectures on communication performance is crucial to designing efficient collective algorithms.  ... 
doi:10.1109/ccgrid.2008.87 dblp:conf/ccgrid/MamidalaKDP08 fatcat:if6w35nuxfcgthgj25crgcn2ia

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems

Miao Luo, Xiaoyi Lu, Khaled Hamidouche, Krishna Kandalla, Dhabaleswar K. Panda
2014 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14  
State-of-the-art MPI libraries rely on locks to guarantee thread-safety. This discourages application developers from using multiple threads to perform MPI operations.  ...  In this paper, we propose a high performance, lock-free multiendpoint MPI runtime, which can achieve up to 40% improvement for point-to-point operation and one representative collective operation with  ...  Introduction MPI/OpenMP hybrid programming model is widely regarded as suitable model for scaling parallel applications on emerging multi-/many-core computing architectures.  ... 
doi:10.1145/2555243.2555287 dblp:conf/ppopp/LuoLHKP14 fatcat:lpr6lccpbrclfgcgedvzrikptq

Optimization of Parallel Discrete Event Simulator for Multi-core Systems

Deepak Jagtap, Nael Abu-Ghazaleh, Dmitry Ponomarev
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
Results show that multithreaded implementation improves performance over the MPI version by up to a factor of 3 for the Core i7 machine and 1.2 on Magny-cours for 48-way simulation.  ...  We study the performance of the simulator on two hardware platforms: a Core i7 machine and a 48-core AMD Opteron Magny-Cours system.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies and endorsements, either expressed or implied, of Air  ... 
doi:10.1109/ipdps.2012.55 dblp:conf/ipps/JagtapAP12 fatcat:cd7kymsruragdjxse7khyrky7a

Threaded MPI programming model for the Epiphany RISC array processor

David Richie, James Ross, Song Park, Dale Shires
2015 Journal of Computational Science  
Using MPI we demonstrate an on-chip performance of 9.1 GFLOPS with an efficiency of 15.3 GFLOPS/W.  ...  We present experimental results for matrix-matrix multiplication using MPI and highlight the importance of fast inter-core data transfers.  ...  Therefore, it is interesting to explore the utility of MPI for programming on-chip parallelism.  ... 
doi:10.1016/j.jocs.2015.04.023 fatcat:bmycj4ivzjbifkemmle24ggl7i

Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters

Lei Chai, Albert Hartono, Dhabaleswar Panda
2006 2006 IEEE International Conference on Cluster Computing  
As new processor and memory architectures advance, clusters start to be built from larger SMP systems, which makes MPI intra-node communication a critical issue in high performance computing.  ...  While running the bandwidth benchmark, the measured L2 cache miss rate is reduced by half. The new design also improves the performance of MPI collective calls by up to 25%.  ...  Software Distribution: The design proposed in this paper will be available for downloading in upcoming MVAPICH releases.  ... 
doi:10.1109/clustr.2006.311850 dblp:conf/cluster/ChaiHP06 fatcat:odigrebu7fe55p34wmzurss73m

Parallel Discrete Event Simulation for Multi-Core Systems: Analysis and Optimization

Jingjing Wang, Deepak Jagtap, Nael Abu-Ghazaleh, Dmitry Ponomarev
2014 IEEE Transactions on Parallel and Distributed Systems  
Our results show that multithreaded implementation improves the performance over an MPI-based version by up to a factor of 3 on the Core i7, 1.4 on the AMD Magny-Cours, and 2.8 on the Tilera Tile64.  ...  We study the performance of the simulator on three hardware platforms: an Intel Core i7 machine, and a 48-core AMD Opteron Magny-Cours system, and a 64-core Tilera TilePro64.  ...  He received his PhD from the University of Cincinnati in 1997. Dmitry Ponomarev is an Associate Professor in the Department of Computer Science at SUNY Binghamton.  ... 
doi:10.1109/tpds.2013.193 fatcat:bphisz5u6baobhhxgb44p2ahta

Modeling Ion Channel Kinetics with HPC

Allison Gehrke, Katherine Rennie, Timothy Benke, Daniel A Connors, Ilkyeun Ra
2010 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC)  
The focus of our study is to examine the step--by-step process of adapting new and existing computational biology models to multicore and distributed memory architectures.  ...  Performance improvements for computational sciences such as biology, physics, and chemistry are critically dependent on advances in multicore and manycore hardware.  ...  the Department of Pediatrics and The Children's Hospital Research Institute (TAB and AG).  ... 
doi:10.1109/hpcc.2010.46 dblp:conf/hpcc/GehrkeRBCR10 fatcat:cbihbi6lfjhmzlmcxefv7vqdoq

Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System [chapter]

Matthew Anderson, Maciej Brodowicz, Abhishek Kulkarni, Thomas Sterling
2014 Lecture Notes in Computer Science  
Yet a priori estimation of the potential performance and scalability impact of such runtime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood  ...  Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific  ...  There are numerous performance studies on the MPI version of GTC [26] , [27] across a wide array of architectures making it an ideal candidate for this case study.  ... 
doi:10.1007/978-3-319-10214-6_7 fatcat:rpvyc7r6dbcvte6dm4q74cbl4i
« Previous Showing results 1 — 15 out of 3,487 results