A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster
2022
Computers Materials & Continua
In addition, we propose a resource allocation method for DL jobs to efficiently utilize the computing resources with multi-agent deep reinforcement learning (mDRL). ...
Therefore, it can be a good reference for building a small or middlesized HPC/AI converged system for research and educational institutes. ...
We would like to thank the Cloud team at Kakao Enterprise, as well as especially thank Jung-Bok Lee for his support during the paper revision. ...
doi:10.32604/cmc.2022.023318
fatcat:nqsx7s252rc2pnujkxc2o4fnkm
Spark on the ARC
2017
Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17
In this paper we document our approach to overcoming service discovery and con guration of Apache Hadoop and Spark frameworks with dynamic resource allocations in a batch oriented Advanced Research Computing ...
data co-dependency, frequently solved with message passing interface (MPI) programming models, and then executed as batch jobs. ...
when deployed in an HPC batch environment as a job. ...
doi:10.1145/3093338.3093375
dblp:conf/xsede/DeYoungSBRT17
fatcat:ss7nbzzr4rg3tmwddolmhamfpe
The Future of Distributed Computing Systems in ATLAS: Boldly Venturing Beyond Grids
2019
EPJ Web of Conferences
We describe the strategies for integrating these heterogeneous resources into ATLAS, and the new software components being developed in PanDA to efficiently use them. ...
Up to a fifth of the resources available to ATLAS are of such new types and require special techniques for integration into PanDA. In this talk, we present the nature and scale of these resources. ...
Hence multiple ATLAS jobs need to be combined into a single HPC multinode workload. ...
doi:10.1051/epjconf/201921403047
fatcat:7kx4qgwp7vchffcucikxbs4wu4
A survey on resource allocation in high performance distributed computing systems
2013
Parallel Computing
In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. ...
The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. ...
Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public ...
doi:10.1016/j.parco.2013.09.009
fatcat:pdbghbohinc55lsgv67yfiw5ni
A Feasibility Study on workload integration between HT-Condor and Slurm Clusters
2019
EPJ Web of Conferences
There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). ...
One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. ...
It is designed to run HTCondor jobs on other heterogeneous clusters, for example, a Slurm cluster in our circumstances. ...
doi:10.1051/epjconf/201921408004
fatcat:liwu7f7devbdvcykpipbxvaqdy
HPC-Aware VM Placement in Infrastructure Clouds
2013
2013 IEEE International Conference on Cloud Engineering (IC2E)
In this work, we address application-aware allocation of n VM instances (comprising a single job request) to physical hosts from a single pool. ...
HPC performance and/or 32% increase in job throughput while limiting the effect of jitter (or noise) to 8%. ...
Sarma and R Suryaprakash for setting up the cloud environment and helping with the scheduler implementation. We thank Alex Zhang for the discussions on server consolidation. ...
doi:10.1109/ic2e.2013.38
dblp:conf/ic2e/GuptaKMFB13
fatcat:xkb2rfdemjextl3i6gc6h27ytu
Generalized Matrix Multiplication and its Object Oriented Model
2014
Scalable Computing : Practice and Experience
We demonstrate that a user can game the system to cause a temporal starvation to the other users of the system, even though all users will eventually finish their job in the shared-computing environment ...
We claim that the current scheduling systems for high performance computing environments are unable to fairly distribute resources among the users, and as such, are unable to maximize the overall user ...
An HPC cluster acts as a repeated zero-sum game with multiple users. The strategy that each user chooses in order to submit tasks directly affects the allocation of resources. ...
doi:10.12694/scpe.v15i3.1020
fatcat:34g5fbljsbdznfmyzuyizjwiqe
Dynamic Traffic Control of Staging Traffic on the Interconnect of the HPC Cluster System
2020
IEEE Access
In [21] , an SDN-accelerated HPC cluster system has been proposed as a concept to integrate SDN into HPC cluster systems. ...
Furthermore, HPC cluster systems such as the Kcomputer [3] adopt a two-layered file system composed of a local file system on each computing node and a global file system shared by multiple computing ...
doi:10.1109/access.2020.3035158
fatcat:77serw75qfe6vogfhvurjeehfq
Power-aware dynamic placement of HPC applications
2008
Proceedings of the 22nd annual international conference on Supercomputing - ICS '08
We use the insights obtained from our experimental study to present a framework and methodology for power-aware application placement for HPC applications. ...
We show that for HPC applications, working set size is a key parameter to take care of while placing applications on virtualized servers. ...
We first investigate the scope for power management on a large HPC cluster in Section 2 using a trace-based study. ...
doi:10.1145/1375527.1375555
dblp:conf/ics/VermaAN08
fatcat:qfjijoob3zg37lzy36tzd5nz5e
Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters
2013
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
In this paper, we propose a Semi-Elastic Cluster (SEC) computing model for organizations to reserve and dynamically resize a virtual cloud-based cluster. ...
We present a set of integrated batch scheduling plus resource scaling strategies uniquely enabled by SEC, as well as an online reserved instance provisioning algorithm based on job history. ...
In particular, we thank our shepherd, Henry Tufo, for providing guidance and feedback during our final paper preparation. ...
doi:10.1145/2503210.2503236
dblp:conf/sc/NiuZMTC13
fatcat:74aqdoexknbdhgrcinub2op7pq
SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter
2014
Journal of Network and Computer Applications
Efficient provisioning of resources is a challenging problem in cloud computing environments due to its dynamic nature and the need for supporting heterogeneous applications. ...
In this paper, we tackle the resource allocation problem within a datacenter that runs different types of application workloads, particularly non-interactive and transactional applications. ...
This paper is a substantially extended version of our previous short conference paper presented at the 11th international Conference on algorithms and Architectures for Parallel Processing (ICA3PP 2011 ...
doi:10.1016/j.jnca.2014.07.030
fatcat:52djius56ngaldod4ovbzot3su
Co-scheduling with User-Settable Reservations
[chapter]
2005
Lecture Notes in Computer Science
This strategy is more easily implemented than a centralized metascheduler because arrangements can be made without requiring control over the individual schedulers for each resource: the reservations are ...
This "Travel Agent Method" serves as the basis for a production scheduler and metascheduler suitable for making travel arrangements for a grid. ...
Making reservations for distributed grid resources can be done in a similar way: resources can be scheduled sequentially for jobs requiring staged multiple resources or in parallel for co-scheduled resources ...
doi:10.1007/11605300_7
fatcat:6ki6htnexjgjximkvrkffc4icu
Strategies for Democratization of Supercomputing: Availability, Accessibility and Usability of High Performance Computing for Education and Practice of Big Data Analytics
[article]
2021
arXiv
pre-print
The second contribution is a set of principles for HPC adoption based on an experiential narrative of HPC usage for textual analytics and NLP of social media data from a first time user perspective. ...
There has been an increasing interest in and growing need for high performance computing (HPC), popularly known as supercomputing, in domains such as textual analytics, business domains analytics, forecasting ...
, leaving a huge gap in education needs for implementing effective HPC usability strategies. ...
arXiv:2104.09091v1
fatcat:vktzjr4g6zgo7pea7ymcrghysu
Hybrid Workload Scheduling on HPC Systems
[article]
2021
arXiv
pre-print
In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system. ...
for malleable jobs, and the performance of rigid applications. ...
ACKNOWLEDGMENT This work is supported in part by US National Science Foundation grants CNS-1717763, CCF-1618776 and U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. ...
arXiv:2109.05412v1
fatcat:24pfsjvjeze4bnamlzur6el5vq
Container orchestration on HPC systems through Kubernetes
2021
Journal of Cloud Computing: Advances, Systems and Applications
We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where container orchestration is performed on two levels. ...
Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters. ...
Joseph Schuchart for proof-reading the contents. 1 ...
doi:10.1186/s13677-021-00231-z
fatcat:vd5ziq5rtzecnennzojmcyerau
« Previous
Showing results 1 — 15 out of 1,666 results