952 Hits in 7.2 sec

Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments

Pawel Czarnul, Jerzy Proficz, Adam Krzywaniak
2019 Scientific Programming  
Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing.  ...  System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems.  ...  SLURM [48] Proposes the enhanced power adaptive scheduling (E-PAS) algorithm with integration of power-aware approach into SLURM for limiting power consumption [49] Approach applicable to MPI applications  ... 
doi:10.1155/2019/8348791 fatcat:ib3dvjzg2bhhjnnklb4kaj2eqi


Jie Ren, Xiaoming Wang, Jianbin Fang, Yansong Feng, Dongxiao Zhu, Zhunchen Luo, Jie Zheng, Zheng Wang
2018 Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies - CoNEXT '18  
We achieve this by developing a machine learning based approach to predict which of the CPU cores to use and the operating frequencies of CPU and GPU.  ...  We obtain, on average, over 17% (up to 63%), 31% (up to 88%), and 30% (up to 91%) improvement respectively for load time, energy consumption and the energy delay product, when compared to two state-of-the-art  ...  These platforms integrate multiple processor cores on the same system, where each processor is tuned for a certain class of workloads to meet a variety of user requirements.  ... 
doi:10.1145/3281411.3281422 dblp:conf/conext/RenWFFZLZW18 fatcat:2pw5khx3gfa47cxeerbtsdnyxi

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS [article]

Szilárd Páll, Artem Zhmurov, Paul Bauer, Mark Abraham, Magnus Lundborg, Alan Gray, Berk Hess, Erik Lindahl
2020 arXiv   pre-print
The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks  ...  between CPUs and GPUs.  ...  integration on the CPU.  ... 
arXiv:2006.09167v2 fatcat:b6jiwmemtvbn3cz3mjfphbfeiu

Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption [article]

Paweł Rościszewski
2018 arXiv   pre-print
One of the main obstacles to achieving this goal is power consumption of the computing systems that exceeds the energy supply limits.  ...  The goal of the dissertation is to extract a general model of hybrid parallel application execution in heterogeneous HPC systems that is a synthesis of existing specific approaches and developing an optimization  ...  can be divided into black-box and white-box.  ... 
arXiv:1809.07611v1 fatcat:f2vl3kmgznckroj6h3uwt2zwf4

GPU Virtualization and Scheduling Methods

Cheol-Ho Hong, Ivor Spence, Dimitrios S. Nikolopoulos
2017 ACM Computing Surveys  
The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high performance computing.  ...  Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving resource and energy efficiency.  ...  ACKNOWLEDGMENTS The authors are grateful to the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/3068281 fatcat:bng347au6veltazpmyyzv5ijmu

Beyond the socket

Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, Aamer Jaleel, Alex Ramirez, David Nellans
2017 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17  
To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model.  ...  Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket.  ...  ACKNOWLEDGEMENTS We would like to thank anonymous reviewers and Steve Keckler for their help in improving this paper.  ... 
doi:10.1145/3123939.3124534 dblp:conf/micro/MilicVBAEJRN17 fatcat:nv5nbokyefhehgjgvsdlmbfadi

Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality

2018 Proceedings of the IEEE  
This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists  ...  multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches  ...  individual cores and also complete CPU/GPU systems (Section IV-A).  ... 
doi:10.1109/jproc.2018.2856739 fatcat:a66m7lzvn5bjvlxyw7qkd2qaky

A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC

Yiannis Georgiou, David Glesser, Krzysztof Rzadca, Denis Trystram
2015 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  
To validate the general feasibility of our approach, we also implemented EFS as an extension for SLURM, a popular HPC resource and job management system.  ...  FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past.  ...  Also take into account that we are here effectively using a black-box approach, as we are scaling the whole application.  ... 
doi:10.1109/ccgrid.2015.101 dblp:conf/ccgrid/GeorgiouGRT15 fatcat:f6ntpua7zjftromhdbyq6yis7q

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision [article]

Wei Gao, Qinghao Hu, Zhisheng Ye, Peng Sun, Xiaolin Wang, Yingwei Luo, Tianwei Zhang, Yonggang Wen
2022 arXiv   pre-print
However, traditional approaches designed for big data or high performance computing workloads can not support DL workloads to fully utilize the GPU resources.  ...  An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization.  ...  This will be a promising research direction in the future. Most of the above works treat the inference jobs as a black box for management and optimization.  ... 
arXiv:2205.11913v3 fatcat:fnbinueyijb4nc75fpzd6hzjgq

Heterogeneous placement optimization for database query processing

Tomas Karnagel, Dirk Habich
2017 it - Information Technology  
The current hardware trend is heterogeneity, where multiple computing units like CPUs and GPUs are used together in one system.  ...  AbstractComputing hardware is constantly evolving and database systems need to adapt to ongoing hardware changes to improve performance.  ...  To allow runtime estimation with the black-box approach, we monitor the execution during query runtime, building a learning-based model for each operator on each CU.  ... 
doi:10.1515/itit-2016-0048 fatcat:accqkgmvl5eshc2hkyfvleh73u

GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation [chapter]

Bartosz Kohnke, Thomas R. Ullmann, Andreas Beckmann, Ivo Kabadshow, David Haensel, Laura Morgenstern, Plamen Dobrev, Gerrit Groenhof, Carsten Kutzner, Berk Hess, Holger Dachsel, Helmut Grubmüller
2020 Lecture Notes in Computational Science and Engineering  
For exascale systems, we expect our approach to outperform current implementations based on Particle Mesh Ewald (PME) electrostatics, because FMM avoids the communication bottlenecks caused by the parallel  ...  Nearoptimal performance is achieved on various SIMD architectures and on GPUs using CUDA.  ...  In fact, GPUs will be treated as one of several resources a node offers (in addition to CPUs), to which tasks can be scheduled.  ... 
doi:10.1007/978-3-030-47956-5_17 fatcat:s3zaawt6njbf5pvdnhx752ky6i

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

Marek Blazewicz, Ian Hinder, David M. Koppelman, Steven R. Brandt, Milosz Ciznicki, Michal Kierzynka, Frank Löffler, Erik Schnetter, Jian Tao
2013 Scientific Programming  
Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning.  ...  The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions.  ...  Acknowledgments The authors would like to thank Gabrielle Allen and Joel E. Tohline at the CCT and Krzysztof Kurowski at PSNC for their vision, encouragement, and continuous support to this project.  ... 
doi:10.1155/2013/167841 fatcat:iajrzmv2i5gvbgh3euy3z5o2za

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms

Alecio P. D. Binotto, Carlos E. Pereira, Arjan Kuijper, Andre Stork, Dieter W. Fellner
2011 2011 IEEE International Conference on High Performance Computing and Communications  
It has been a significant research and personal challenge and it is one of the most important steps on my career.  ...  To reach this goal, a set of personal, technical, and financial support were needed, which without any of them I could not have developed this work.  ...  The core module of the system works with scheduling heuristics oriented to one CPU and one GPU.  ... 
doi:10.1109/hpcc.2011.20 dblp:conf/hpcc/BinottoPKSF11 fatcat:bjdij42z5fe7dmfykjjj3n7p74

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters [article]

Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang
2021 arXiv   pre-print
As case studies, we design: a Quasi-Shortest-Service-First scheduling service, which can minimize the cluster-wide average job completion time by up to 6.5x; and a Cluster Energy Saving service, which  ...  Second, we introduce a general-purpose framework, which manages resources based on historical data.  ...  Other services based on machine learning prediction can also be integrated into our framework, e.g., burstiness-aware resource manager [80, 83] , network-aware job scheduler [12, 38] , etc. (2) High  ... 
arXiv:2109.01313v1 fatcat:izw77evef5fpzb2ent3u6adyca

Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations [article]

Li Tan, Zizhong Chen
2015 arXiv   pre-print
In this paper, we propose TX, a library level race-to-halt DVFS scheduling approach that analyzes Task Dependency Set of each task in parallel Cholesky, LU, and QR factorizations to achieve substantial  ...  Although OS level solutions have demonstrated the effectiveness of saving energy in a black-box fashion, for applications with variable execution characteristics, the optimal energy efficiency can be blundered  ...  [39] proposed a power-aware static mapping technique to assign applications for a CPU/GPU heterogeneous system that reduces power and energy costs via DVFS on both CPU and GPU, with timing requirements  ... 
arXiv:1411.2536v2 fatcat:iqnisonlxjhudebrm7wtaccjaa
« Previous Showing results 1 — 15 out of 952 results