932 Hits in 8.6 sec

Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths [article]

Edward Hutter, Edgar Solomonik
2021 arXiv   pre-print
This strategy is effective in the presence of frequently-recurring computation and communication kernels, which is characteristic to algorithms in numerical linear algebra.  ...  We then leverage online execution path analysis to coordinate selective kernel execution and propagate each kernel's statistical profile.  ...  ACKNOWLEDGMENTS The first author would like to acknowledge the Department of Energy (DOE) and Krell Institute for support via the DOE Computational Science Graduate Fellowship (grant No.  ... 
arXiv:2103.01304v1 fatcat:64gtczdht5fqrhbonxcs7ikc34

Power profiling of Cholesky and QR factorizations on distributed memory systems

George Bosilca, Hatem Ltaief, Jack Dongarra
2012 Computer Science - Research and Development  
This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLA-SMA.  ...  We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR.  ...  Acknowledgment The authors would like to thank Pr. Kirk Cameron from the Department of Computer Science at Virginia Tech, for granting access to his platform.  ... 
doi:10.1007/s00450-012-0224-2 fatcat:dnvgeb6jmjfphdwbsqfdtlf6oi

Exploring large macromolecular functional motions on clusters of multicore processors

José R. López-Blanco, Ruymán Reyes, José I. Aliaga, Rosa M. Badia, Pablo Chacón, Enrique S. Quintana-Ortí
2013 Journal of Computational Physics  
Our experiments show the superior performance of iterative Krylov-subspace methods for the solution of the dense generalized eigenproblems arising in these biological applications over more traditional  ...  Normal modes in internal coordinates (IC) furnish an excellent way to model functional collective motions of macromolecular machines, but exhibit a high computational cost when applied to large-sized macromolecules  ...  ScaLAPACK ScaLAPACK provides a collection of parallel message-passing routines for the solution of a number of dense linear algebra problems, concretely, linear systems of equations, linear-least squares  ... 
doi:10.1016/ fatcat:jkkks7rk2ze2zbf5p737ss6rgq

Preparing sparse solvers for exascale computing

Hartwig Anzt, Erik Boman, Rob Falgout, Pieter Ghysels, Michael Heroux, Xiaoye Li, Lois Curfman McInnes, Richard Tran Mills, Sivasankaran Rajamanickam, Karl Rupp, Barry Smith, Ichitaro Yamazaki (+1 others)
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms.  ...  Sparse solvers provide essential functionality for a wide variety of scientific applications.  ...  energy savings [18] .  ... 
doi:10.1098/rsta.2019.0053 pmid:31955673 fatcat:bqw6xqixbrabddmxglmtcbw2wa

Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems

H. Anzt, E. S. Quintana-Orti
2014 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
For the particular domain of sparse linear algebra, we analyse the energy efficiency of a broad collection of hardware architectures and investigate how algorithmic and implementation modifications can  ...  One contribution of 14 to a Theme Issue 'Stochastic modelling and energy-efficient computing for weather and climate prediction' .  ...  We thank the long list of colleagues that were involved in this cooperation.  ... 
doi:10.1098/rsta.2013.0279 pmid:24842036 fatcat:kw7cnmvzrff6pmihhqenl53uwm

Self-adapting numerical software (SANS) effort

J. Dongarra, G. Bosilca, Z. Chen, V. Eijkhout, G. E. Fagg, E. Fuentes, J. Langou, P. Luszczek, J. Pjesivac-Grbovic, K. Seymour, H. You, S. S. Vadhiyar
2006 IBM Journal of Research and Development  
Attempts to automate such decisions distinguish three levels: • Algorithmic decision; • Management of the parallel environment; • Processor-specific tuning of kernels.  ...  The challenge for the development of next generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions  ...  linear algebra, transparently to the user. • SALSA: a system for picking optimal algorithms based on statistical analysis of the user problem. • FT-LA: a linear algebra approach to fault tolerance and  ... 
doi:10.1147/rd.502.0223 fatcat:uklej3fgarguzktd7rhv6aqoya

On the Easy Use of Scientific Computing Services for Large Scale Linear Algebra and Parallel Decision Making with the P-Grade Portal

Hrachya Astsatryan, Vladimir Sahakyan, Yuri Shoukouryan, Michel Daydé, Aurelie Hurault, Ronan Guivarch, Harutyun Terzyan, Levon Hovhannisyan
2013 Journal of Grid Computing  
analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.).  ...  This is an author-deposited version published in : Eprints ID : 12445 To link to this article : Abstract Scientific research is becoming increasingly dependent on the large-scale  ...  Scientifique, France and State Committee of Science of Armenia) Project.  ... 
doi:10.1007/s10723-013-9254-7 fatcat:diuhsrbvfvf2xeladr6kvrouqi

Self-adapting software for numerical linear algebra and LAPACK for clusters

Zizhong Chen, Jack Dongarra, Piotr Luszczek, Kenneth Roche
2003 Parallel Computing  
This article describes the context, design, and recent development of the LAPACK for clusters (LFC) project.  ...  It has been developed in the framework of Self-Adapting Numerical Software (SANS) since we believe such an approach can deliver the convenience and ease of use of existing sequential environments bundled  ...  We also wish to thank NPACI, the National Partnership for the Advancement of Computational Infrastrucure, for including LFC in its NPACkage.  ... 
doi:10.1016/j.parco.2003.05.014 fatcat:25zu22g27rbqfgcd5nlhw26rtq

Energy-Aware High Performance Computing [chapter]

Martin Wlotzka, Vincent Heuveline, Manuel F. Dolz, M. Reza Heidari, Thomas Ludwig, A. Cristiano I. Malossi, Enrique S. Quintana-Orti
2017 ICT - Energy Concepts for Energy Efficiency and Sustainability  
Finally, we discuss opportunities for saving energy in computations by means of two examples.  ...  High performance computing centres consume substantial amounts of energy to power large-scale supercomputers and the necessary building and cooling infrastructure.  ...  of any sparse linear algebra operation (with a straight-forward extension to the dense case), thus covering a significant percentage of existing scientific computing kernels.  ... 
doi:10.5772/66404 fatcat:bpzz2exlibczfe4jjkaia7gn7q

Guest Editorial High Performance Computing (HPC) Applications for a More Resilient and Efficient Power Grid

Zhenyu Henry Huang, Zeb Tate, Shrirang Abhyankar, Zhaoyang Dong, Siddhartha Khaitan, Liang Min, Gary Taylor
2017 IEEE Transactions on Smart Grid  
: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).  ...  Block structures are exploited so that the large-scale dense matrix computations can be processed in parallel. This helps in memory savings as well as in overall computational time.  ... 
doi:10.1109/tsg.2017.2690478 fatcat:kd7yvfozovanfp32qruvh5z5k4

Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems

José I. Aliaga, María Barreda, Manuel F. Dolz, Alberto F. Martín, Rafael Mayo, Enrique S. Quintana-Ortí
2014 Cluster Computing  
We investigate the benefits that an energyaware implementation of the runtime in charge of the concurrent execution of ILUPACK -a sophisticated preconditioned iterative solver for sparse linear systems-produces  ...  energy-aware strategies as well as the impact of the P-states into ILUPACK's runtime, at high accuracy, on two distinct platforms based on multicore technology from AMD and Intel.  ...  Acknowledgments The researchers from the Universidad Jaume I were supported by project CICYT TIN2011-23283 of the Ministerio de Ciencia e Innovación and FEDER and the FPU program of the Ministerio de Educación  ... 
doi:10.1007/s10586-014-0402-z fatcat:p7dpkyi3vvfnznra5epmmcwoem

AAP4All: An Adaptive Auto Parallelization of Serial Code for HPC Systems

M. Usman Ashraf, Fathy Alburaei Eassa, Leon J. Osterweil, Aiiad Ahmad Albeshri, Abdullah Algarni, Iqra Ilyas
2021 Intelligent Automation and Soft Computing  
A key advantage of proposed tool is an auto recognition of computer system architecture, then translate automatically the input serial C++ code into parallel programming code for that particular detected  ...  Due to increasing the complexity of one chip many-cores/multi-cores systems, only well-balanced and optimized parallel programming technique is the solution to provide substantial increase in performance  ...  Acknowledgement: The authors, acknowledge with thanks DSR King Abdulaziz University, Jeddah, Saudi Arabia for technical and financial support.  ... 
doi:10.32604/iasc.2021.019044 fatcat:quu46dwokbf2fkq24jrtp7uo7i

Exploring cross-layer power management for PGAS applications on the SCC platform

Marc Gamell, Ivan Rodero, Manish Parashar, Rajeev Muralidhar
2012 Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing - HPDC '12  
These trends are presenting new sets of challenges to HPC applications including programming complexity and the need for extreme energy efficiency.  ...  Results obtained via empirical evaluation of Unified Parallel C (UPC) applications on the SCC platform under different constraints, show that, for specific operations, the potential for energy savings  ...  We thank Intel for the access to the SCC and the opportunity to contribute to its MARC (Many-core Appli-cations Research Community) program.  ... 
doi:10.1145/2287076.2287113 dblp:conf/hpdc/GamellRPM12 fatcat:ll4ap4kv5vc5xmsqlny76nxt3i

COUNTDOWN: a Run-time Library for Performance-Neutral Energy Saving in MPI Applications [article]

Daniele Cesarini, Andrea Bartolini, Pietro Bonfà, Carlo Cavazzoni, Luca Benini
2019 arXiv   pre-print
For the NAS benchmarks, COUNTDOWN saves between 6% and 50% energy, with a time-to-solution penalty lower than 5%.  ...  Energy saving increases to 37% with a performance penalty of 6.38%, if the application is executed without communication tuning.  ...  QE main computational kernels include dense parallel linear algebra (diagonalizzation) and 3D parallel FFT.  ... 
arXiv:1806.07258v2 fatcat:zkykp7lbbfdxpgua7woe4vsaxq

D7.4: Evaluation of Tools and Techniques for Future Exascale Systems

Buket Benek Gursoy, Michael Browne, Michael Lysaght
2017 Zenodo  
In particular, we see the need for an increased focus on energy efficient computing within WP7 in PRACE-5IP in order to take full advantage of the solutions being delivered as part of the final phase of  ...  The objective of the PRACE-4IP Work Package 7 (WP7) 'Application Enabling and Support' is to provide enabling support for High Performance Computing (HPC) applications codes, to ensure that these applications  ...  It is written in Fortran 95 and is parallelized for distributed memory architectures using a classical SPMD strategy combining a partitioning of the underlying mesh with a message-passing programming model  ... 
doi:10.5281/zenodo.6801727 fatcat:iae2alrlkjcudddpnbsz6knumq
« Previous Showing results 1 — 15 out of 932 results