A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Energy-Aware High-Performance Computing: Survey of State-of-the-Art Tools, Techniques, and Environments
2019
Scientific Programming
Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing. ...
System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. ...
SLURM [48] Proposes the enhanced power adaptive scheduling (E-PAS) algorithm with integration of power-aware approach into SLURM for limiting power consumption [49] Approach applicable to MPI applications ...
doi:10.1155/2019/8348791
fatcat:ib3dvjzg2bhhjnnklb4kaj2eqi
Proteus
2018
Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies - CoNEXT '18
We achieve this by developing a machine learning based approach to predict which of the CPU cores to use and the operating frequencies of CPU and GPU. ...
We obtain, on average, over 17% (up to 63%), 31% (up to 88%), and 30% (up to 91%) improvement respectively for load time, energy consumption and the energy delay product, when compared to two state-of-the-art ...
These platforms integrate multiple processor cores on the same system, where each processor is tuned for a certain class of workloads to meet a variety of user requirements. ...
doi:10.1145/3281411.3281422
dblp:conf/conext/RenWFFZLZW18
fatcat:2pw5khx3gfa47cxeerbtsdnyxi
Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS
[article]
2020
arXiv
pre-print
The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks ...
between CPUs and GPUs. ...
integration on the CPU. ...
arXiv:2006.09167v2
fatcat:b6jiwmemtvbn3cz3mjfphbfeiu
Optimization of hybrid parallel application execution in heterogeneous high performance computing systems considering execution time and power consumption
[article]
2018
arXiv
pre-print
One of the main obstacles to achieving this goal is power consumption of the computing systems that exceeds the energy supply limits. ...
The goal of the dissertation is to extract a general model of hybrid parallel application execution in heterogeneous HPC systems that is a synthesis of existing specific approaches and developing an optimization ...
can be divided into black-box and white-box. ...
arXiv:1809.07611v1
fatcat:f2vl3kmgznckroj6h3uwt2zwf4
GPU Virtualization and Scheduling Methods
2017
ACM Computing Surveys
The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high performance computing. ...
Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving resource and energy efficiency. ...
ACKNOWLEDGMENTS The authors are grateful to the anonymous reviewers for their valuable comments and suggestions. ...
doi:10.1145/3068281
fatcat:bng347au6veltazpmyyzv5ijmu
Beyond the socket
2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 '17
To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model. ...
Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket. ...
ACKNOWLEDGEMENTS We would like to thank anonymous reviewers and Steve Keckler for their help in improving this paper. ...
doi:10.1145/3123939.3124534
dblp:conf/micro/MilicVBAEJRN17
fatcat:nv5nbokyefhehgjgvsdlmbfadi
Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality
2018
Proceedings of the IEEE
This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists ...
multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches ...
individual cores and also complete CPU/GPU systems (Section IV-A). ...
doi:10.1109/jproc.2018.2856739
fatcat:a66m7lzvn5bjvlxyw7qkd2qaky
A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC
2015
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
To validate the general feasibility of our approach, we also implemented EFS as an extension for SLURM, a popular HPC resource and job management system. ...
FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past. ...
Also take into account that we are here effectively using a black-box approach, as we are scaling the whole application. ...
doi:10.1109/ccgrid.2015.101
dblp:conf/ccgrid/GeorgiouGRT15
fatcat:f6ntpua7zjftromhdbyq6yis7q
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
[article]
2022
arXiv
pre-print
However, traditional approaches designed for big data or high performance computing workloads can not support DL workloads to fully utilize the GPU resources. ...
An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization. ...
This will be a promising research direction in the future. Most of the above works treat the inference jobs as a black box for management and optimization. ...
arXiv:2205.11913v3
fatcat:fnbinueyijb4nc75fpzd6hzjgq
Heterogeneous placement optimization for database query processing
2017
it - Information Technology
The current hardware trend is heterogeneity, where multiple computing units like CPUs and GPUs are used together in one system. ...
AbstractComputing hardware is constantly evolving and database systems need to adapt to ongoing hardware changes to improve performance. ...
To allow runtime estimation with the black-box approach, we monitor the execution during query runtime, building a learning-based model for each operator on each CU. ...
doi:10.1515/itit-2016-0048
fatcat:accqkgmvl5eshc2hkyfvleh73u
GROMEX: A Scalable and Versatile Fast Multipole Method for Biomolecular Simulation
[chapter]
2020
Lecture Notes in Computational Science and Engineering
For exascale systems, we expect our approach to outperform current implementations based on Particle Mesh Ewald (PME) electrostatics, because FMM avoids the communication bottlenecks caused by the parallel ...
Nearoptimal performance is achieved on various SIMD architectures and on GPUs using CUDA. ...
In fact, GPUs will be treated as one of several resources a node offers (in addition to CPUs), to which tasks can be scheduled. ...
doi:10.1007/978-3-030-47956-5_17
fatcat:s3zaawt6njbf5pvdnhx752ky6i
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
2013
Scientific Programming
Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. ...
The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. ...
Acknowledgments The authors would like to thank Gabrielle Allen and Joel E. Tohline at the CCT and Krzysztof Kurowski at PSNC for their vision, encouragement, and continuous support to this project. ...
doi:10.1155/2013/167841
fatcat:iajrzmv2i5gvbgh3euy3z5o2za
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
2011
2011 IEEE International Conference on High Performance Computing and Communications
It has been a significant research and personal challenge and it is one of the most important steps on my career. ...
To reach this goal, a set of personal, technical, and financial support were needed, which without any of them I could not have developed this work. ...
The core module of the system works with scheduling heuristics oriented to one CPU and one GPU. ...
doi:10.1109/hpcc.2011.20
dblp:conf/hpcc/BinottoPKSF11
fatcat:bjdij42z5fe7dmfykjjj3n7p74
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters
[article]
2021
arXiv
pre-print
As case studies, we design: a Quasi-Shortest-Service-First scheduling service, which can minimize the cluster-wide average job completion time by up to 6.5x; and a Cluster Energy Saving service, which ...
Second, we introduce a general-purpose framework, which manages resources based on historical data. ...
Other services based on machine learning prediction can also be integrated into our framework, e.g., burstiness-aware resource manager [80, 83] , network-aware job scheduler [12, 38] , etc. (2) High ...
arXiv:2109.01313v1
fatcat:izw77evef5fpzb2ent3u6adyca
Algorithmic Energy Saving for Parallel Cholesky, LU, and QR Factorizations
[article]
2015
arXiv
pre-print
In this paper, we propose TX, a library level race-to-halt DVFS scheduling approach that analyzes Task Dependency Set of each task in parallel Cholesky, LU, and QR factorizations to achieve substantial ...
Although OS level solutions have demonstrated the effectiveness of saving energy in a black-box fashion, for applications with variable execution characteristics, the optimal energy efficiency can be blundered ...
[39] proposed a power-aware static mapping technique to assign applications for a CPU/GPU heterogeneous system that reduces power and energy costs via DVFS on both CPU and GPU, with timing requirements ...
arXiv:1411.2536v2
fatcat:iqnisonlxjhudebrm7wtaccjaa
« Previous
Showing results 1 — 15 out of 952 results