Filters








920 Hits in 4.2 sec

Many–Core Sustainability by Pragma Directives [chapter]

Andreas Kucher, Gundolf Haase
2014 Lecture Notes in Computer Science  
We present a case study for a many-core acceleration of a large-scale commercial CFD solver by means of such frameworks.  ...  An upcoming turn from language-based many-core programming towards directive-based frameworks, similar to OpenMP, is an attempt to tackle these issues.  ...  This work is supported by the CleanSky Joint Undertaking trough grant JTI-CS-2010-1-GRA-02-008 within the Seventh Framework Programme of the European Union.  ... 
doi:10.1007/978-3-662-43880-0_51 fatcat:7fruyrdyfjbjbbtvwl4zuzdzzi

Extendable pattern-oriented optimization directives

Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan
2011 International Symposium on Code Generation and Optimization (CGO 2011)  
Current programming models and compiler technologies for multi-core processors do not exploit well the performance benefits obtainable by applying algorithm-specific, i.e., semantic-specific optimizations  ...  To validate this new methodology, a framework, named EPOD, is developed to map such directives to the underlying optimization schemes.  ...  The peak performance is 709GFLOPS. • Godson-T is a many-core prototype which has 64 homogeneous cores supporting 32-bit MIPS ISA.  ... 
doi:10.1109/cgo.2011.5764679 dblp:conf/cgo/CuiXWYFF11 fatcat:cv2p4qu6xrhvldik65wfahl53q

Extendable pattern-oriented optimization directives

Huimin Cui, Jingling Xue, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan
2012 ACM Transactions on Architecture and Code Optimization (TACO)  
Current programming models and compiler technologies for multi-core processors do not exploit well the performance benefits obtainable by applying algorithm-specific, i.e., semantic-specific optimizations  ...  To validate this new methodology, a framework, named EPOD, is developed to map such directives to the underlying optimization schemes.  ...  The peak performance is 709GFLOPS. • Godson-T is a many-core prototype which has 64 homogeneous cores supporting 32-bit MIPS ISA.  ... 
doi:10.1145/2355585.2355587 fatcat:zkx6ykcm3nf2ld3kzkcqk2btrq

Developing Efficient Implementations of Bellman–Ford and Forward-Backward Graph Algorithms for NEC SX-ACE

2018 Supercomputing Frontiers and Innovations  
The reported study was supported by the Russian Foundation for Basic Research, project No. 18-57-50005.  ...  Acknowledgments The research is carried out using the equipment of the shared research facilities of HPC computing resources at Lomonosov Moscow State University supported by the project RFMEFI62117X0011  ...  Moreover, #pragma cdir vprefetch directive, #pragma cdir vovertake and #pragma cdir vob directives are used to effectively issue vector gather and scatter instructions, which can easily become bottlenecks  ... 
doi:10.14529/jsfi180311 fatcat:wnkdurjru5fazj6srziu7ftz7e

Delivering Parallel Programmability to the Masses via the Intel MIC Ecosystem: A Case Study

Kaixi Hou, Hao Wang, Wu-chun Feng
2014 2014 43rd International Conference on Parallel Processing Workshops  
This manycore architecture exhibits not only massive inter-core parallelism but also intra-core parallelism via a wider SIMD width.  ...  The optimizations include reordering dataaccess patterns, adjusting loop structures, vectorizing branches, and using OpenMP directives.  ...  The compiler directives contain a set of pragmas to take advantage of both intra-and inter-core parallelism.  ... 
doi:10.1109/icppw.2014.44 dblp:conf/icppw/HouWF14 fatcat:5rn33ekzvncyrpubcrokyy2kpa

Comparative Analysis of OpenACC Compilers [chapter]

Daniel Barba, Arturo Gonzalez-Escribano, Diego R. Llanos
2016 Lecture Notes in Computer Science  
In this work, we analyze different available OpenACC compilers that have been developed by companies or universities during the last years.  ...  IC1305: Network for Sustainable Ultrascale Computing (NESUS).  ...  This research has been partially supported by MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action  ... 
doi:10.1007/978-3-319-49956-7_7 fatcat:yshvdzeg2fd55oahzev6nou5km

Performance and portability of accelerated lattice Boltzmann applications with OpenACC

Enrico Calore, Alessandro Gabbana, Jiri Kraus, Sebastiano Fabio Schifano, Raffaele Tripiccione
2016 Concurrency and Computation  
An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators.  ...  Among them, OpenACC offers a high-level approach based on compiler directive clauses to mark regions of existing C, C++ or Fortran codes to run on accelerators.  ...  AG has been supported by the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 642069.  ... 
doi:10.1002/cpe.3862 fatcat:r5t72w47j5elfarytq64hfua7e

High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study

Arslan Munir, Farinaz Koushanfar, Ann Gordon-Ross, Sanjay Ranka
2013 Journal of Supercomputing  
Technological advancements in the silicon industry, as predicted by Moore's law, have resulted in an increasing number of processor cores on a single chip, giving rise to multicore, and subsequently many-core  ...  This work focuses on identifying key architecture and software optimizations to attain high performance from tiled many-core architectures (TMAs)-an architectural innovation in the multicore technology  ...  Acknowledgements This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Space and Naval Warfare Systems Command (SPAWAR N66001-11-1-4103), the Office of  ... 
doi:10.1007/s11227-013-0916-9 fatcat:wi7tizdsdvhuhd2fwxxai65iem

DINO: Divergent node cloning for sustained redundancy in HPC

Arash Rezaei, Frank Mueller, Paul Hargrove, Eric Roman
2017 Journal of Parallel and Distributed Computing  
This work contributes the idea of end-to-end resilience by protecting windows of vulnerability between kernels guarded by different resilience techniques.  ...  The work further promotes end-to-end application protection across kernels via a pragma-based specification, implemented as an extension to OpenMP, for diverse resilience schemes with minimal programming  ...  Our approach requires a minimal effort by application programmers and is highly portable.  ... 
doi:10.1016/j.jpdc.2017.06.010 fatcat:p56v36gvsnebpjedhcnxqlitre

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing [chapter]

Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, Toshihiro Hanawa, Yuetsu Kodama, Raymond Namyst, Samuel Thibault, Olivier Aumage
2013 Lecture Notes in Computer Science  
On the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources.  ...  Acknowledgment This work is partially supported by a JST-ANR Joint Project entitled "Framework and Programming for Post Petascale Computing (FP3C)" and JST/CREST program entitled "Research and Development  ...  It is described by new pragma "reset weight" as "#pragma xmp device reset weight (cpu weight)". Here, cpu weight provides the CPU Weight to be applied after this point until it is reset again.  ... 
doi:10.1007/978-3-319-03889-6_7 fatcat:lhier2svczdq3imxgthgxlixoa

Porting And Optimizing Hydro To New Platforms And Programming Paradigms – Lessons Learnt

Pierre-François Lavallée
2012 Zenodo  
HYDRO includes classical algorithms we can find in many applications codes for Tier-0 systems.  ...  It has been written in several versions including Fortran and C in order to experiment many new ways of parallelism and to adapt it easily to new architectures that are emerging.  ...  Acknowledgements This work was financially supported by the PRACE project funded in part by the EUs 7th Framework Programme (FP7/2007-2013) under grant agreement no. RI-211528 and FP7-261557.  ... 
doi:10.5281/zenodo.814563 fatcat:cmqxesqx2jdqvfmsmzn7usyrlq

A reflection on the origins, evolution, and future of PRAGMA

Peter Arzberger
2017 Concurrency and Computation  
This paper, a reflection on PRAGMA, will provide additional technical, scientific and human context to many of these papers.  ...  We hope to illustrate that it is the people who set directions by following their interests or posing questions, who make progress by honoring their commitments, and who build community by establishing  ...  A defining moment is the PRAGMA community response to the SARS epidemic that affected many PRAGMA sites [1] .  ... 
doi:10.1002/cpe.4136 fatcat:irsqjr2a7vb3vmwhmssy64fmg4

Openmp Parallelization Of The Slilab Code

Evghenii Gaburov
2014 Zenodo  
Acknowledgements This work was granted access to the HPC resources of SURFsara/Cartesius and Hybrid/CSC and EPCC/Archer and EPCC/Hector made available within the Distributed European Computing Initiative by  ...  to the direct main memory interface.  ...  However, by using the correlation between the science rate and the sustained bandwidth from Table 1 , we estimated the sustained bandwidth for the best performance runs to be 30 and 55 GB/s when used  ... 
doi:10.5281/zenodo.823068 fatcat:u333ze2ezrhy3fahm5i5lka7fq

An Approach for Semiautomatic Locality Optimizations Using OpenMP [chapter]

Jens Breitbart
2012 Lecture Notes in Computer Science  
In tile based many-core system it may for example be possible to have a set of closely coupled cores working on a single tile.  ...  It is left for future work to analyze the usability of the new extensions for upcoming many-core architectures.  ... 
doi:10.1007/978-3-642-28145-7_29 fatcat:pnlzpvfi65fajih63nchg7wqoi

Staggered Dslash Performance on Intel Xeon Phi Architecture [article]

Ruizi Li, Steven Gottlieb
2014 arXiv   pre-print
We test the performance of CG and dslash, the key step in the CG algorithm, on the Intel Xeon Phi, also known as the Many Integrated Core (MIC) architecture.  ...  We appreciate many useful suggestions on the manuscript from Bálint Joó.  ...  This work was supported by DOE grants FG02-91ER 40661 and DE-SC0010120, and by the NSF/University of Tennessee Award A12-0848-S004.  ... 
arXiv:1411.2087v1 fatcat:gcegfha2tzbi5j6px5s5a5vdoi
« Previous Showing results 1 — 15 out of 920 results