1,666 Hits in 4.5 sec

Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster

Jargalsaikhan Narantuya, Jun-Sik Shin, Sun Park, JongWon Kim
2022 Computers Materials & Continua  
In addition, we propose a resource allocation method for DL jobs to efficiently utilize the computing resources with multi-agent deep reinforcement learning (mDRL).  ...  Therefore, it can be a good reference for building a small or middlesized HPC/AI converged system for research and educational institutes.  ...  We would like to thank the Cloud team at Kakao Enterprise, as well as especially thank Jung-Bok Lee for his support during the paper revision.  ... 
doi:10.32604/cmc.2022.023318 fatcat:nqsx7s252rc2pnujkxc2o4fnkm

Spark on the ARC

Mark E. DeYoung, Mohammed Salman, Himanshu Bedi, David Raymond, Joseph G. Tront
2017 Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact - PEARC17  
In this paper we document our approach to overcoming service discovery and con guration of Apache Hadoop and Spark frameworks with dynamic resource allocations in a batch oriented Advanced Research Computing  ...  data co-dependency, frequently solved with message passing interface (MPI) programming models, and then executed as batch jobs.  ...  when deployed in an HPC batch environment as a job.  ... 
doi:10.1145/3093338.3093375 dblp:conf/xsede/DeYoungSBRT17 fatcat:ss7nbzzr4rg3tmwddolmhamfpe

The Future of Distributed Computing Systems in ATLAS: Boldly Venturing Beyond Grids

Fernando Barreiro, Doug Benjamin, Taylor Childers, Kaushik De, Johannes Elmsheuser, Andrej Filipcic, Alexei Klimentov, Mario Lassnig, Tadashi Maeno, Danila Oleynik, Sergey Panitkin, Torre Wenaus (+5 others)
2019 EPJ Web of Conferences  
We describe the strategies for integrating these heterogeneous resources into ATLAS, and the new software components being developed in PanDA to efficiently use them.  ...  Up to a fifth of the resources available to ATLAS are of such new types and require special techniques for integration into PanDA. In this talk, we present the nature and scale of these resources.  ...  Hence multiple ATLAS jobs need to be combined into a single HPC multinode workload.  ... 
doi:10.1051/epjconf/201921403047 fatcat:7kx4qgwp7vchffcucikxbs4wu4

A survey on resource allocation in high performance distributed computing systems

Hameed Hussain, Saif Ur Rehman Malik, Abdul Hameed, Samee Ullah Khan, Gage Bickler, Nasro Min-Allah, Muhammad Bilal Qureshi, Limin Zhang, Wang Yongji, Nasir Ghani, Joanna Kolodziej, Albert Y. Zomaya (+11 others)
2013 Parallel Computing  
In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported.  ...  The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies.  ...  Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public  ... 
doi:10.1016/j.parco.2013.09.009 fatcat:pdbghbohinc55lsgv67yfiw5ni

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

R. Du, J. Shi, J. Zou, X. Jiang, Z. Sun, G. Chen, A. Forti, L. Betev, M. Litmaath, O. Smirnova, P. Hristov
2019 EPJ Web of Conferences  
There are two production clusters co-existed in the Institute of High Energy Physics (IHEP).  ...  One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager.  ...  It is designed to run HTCondor jobs on other heterogeneous clusters, for example, a Slurm cluster in our circumstances.  ... 
doi:10.1051/epjconf/201921408004 fatcat:liwu7f7devbdvcykpipbxvaqdy

HPC-Aware VM Placement in Infrastructure Clouds

A. Gupta, L. V. Kale, D. Milojicic, P. Faraboschi, S. M. Balle
2013 2013 IEEE International Conference on Cloud Engineering (IC2E)  
In this work, we address application-aware allocation of n VM instances (comprising a single job request) to physical hosts from a single pool.  ...  HPC performance and/or 32% increase in job throughput while limiting the effect of jitter (or noise) to 8%.  ...  Sarma and R Suryaprakash for setting up the cloud environment and helping with the scheduler implementation. We thank Alex Zhang for the discussions on server consolidation.  ... 
doi:10.1109/ic2e.2013.38 dblp:conf/ic2e/GuptaKMFB13 fatcat:xkb2rfdemjextl3i6gc6h27ytu

Generalized Matrix Multiplication and its Object Oriented Model

Art Sedighi, Yuefan Deng, Peng Zhang
2014 Scalable Computing : Practice and Experience  
We demonstrate that a user can game the system to cause a temporal starvation to the other users of the system, even though all users will eventually finish their job in the shared-computing environment  ...  We claim that the current scheduling systems for high performance computing environments are unable to fairly distribute resources among the users, and as such, are unable to maximize the overall user  ...  An HPC cluster acts as a repeated zero-sum game with multiple users. The strategy that each user chooses in order to submit tasks directly affects the allocation of resources.  ... 
doi:10.12694/scpe.v15i3.1020 fatcat:34g5fbljsbdznfmyzuyizjwiqe

Dynamic Traffic Control of Staging Traffic on the Interconnect of the HPC Cluster System

Arata Endo, Hiroki Ohtsuji, Erika Hayashi, Eiji Yoshida, Chunghan Lee, Susumu Date, Shinji Shimojo
2020 IEEE Access  
In [21] , an SDN-accelerated HPC cluster system has been proposed as a concept to integrate SDN into HPC cluster systems.  ...  Furthermore, HPC cluster systems such as the Kcomputer [3] adopt a two-layered file system composed of a local file system on each computing node and a global file system shared by multiple computing  ... 
doi:10.1109/access.2020.3035158 fatcat:77serw75qfe6vogfhvurjeehfq

Power-aware dynamic placement of HPC applications

Akshat Verma, Puneet Ahuja, Anindya Neogi
2008 Proceedings of the 22nd annual international conference on Supercomputing - ICS '08  
We use the insights obtained from our experimental study to present a framework and methodology for power-aware application placement for HPC applications.  ...  We show that for HPC applications, working set size is a key parameter to take care of while placing applications on virtualized servers.  ...  We first investigate the scope for power management on a large HPC cluster in Section 2 using a trace-based study.  ... 
doi:10.1145/1375527.1375555 dblp:conf/ics/VermaAN08 fatcat:qfjijoob3zg37lzy36tzd5nz5e

Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters

Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Xiongchao Tang, Wenguang Chen
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
In this paper, we propose a Semi-Elastic Cluster (SEC) computing model for organizations to reserve and dynamically resize a virtual cloud-based cluster.  ...  We present a set of integrated batch scheduling plus resource scaling strategies uniquely enabled by SEC, as well as an online reserved instance provisioning algorithm based on job history.  ...  In particular, we thank our shepherd, Henry Tufo, for providing guidance and feedback during our final paper preparation.  ... 
doi:10.1145/2503210.2503236 dblp:conf/sc/NiuZMTC13 fatcat:74aqdoexknbdhgrcinub2op7pq

SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter

Saurabh Kumar Garg, Adel Nadjaran Toosi, Srinivasa K. Gopalaiyengar, Rajkumar Buyya
2014 Journal of Network and Computer Applications  
Efficient provisioning of resources is a challenging problem in cloud computing environments due to its dynamic nature and the need for supporting heterogeneous applications.  ...  In this paper, we tackle the resource allocation problem within a datacenter that runs different types of application workloads, particularly non-interactive and transactional applications.  ...  This paper is a substantially extended version of our previous short conference paper presented at the 11th international Conference on algorithms and Architectures for Parallel Processing (ICA3PP 2011  ... 
doi:10.1016/j.jnca.2014.07.030 fatcat:52djius56ngaldod4ovbzot3su

Co-scheduling with User-Settable Reservations [chapter]

Kenneth Yoshimoto, Patricia Kovatch, Phil Andrews
2005 Lecture Notes in Computer Science  
This strategy is more easily implemented than a centralized metascheduler because arrangements can be made without requiring control over the individual schedulers for each resource: the reservations are  ...  This "Travel Agent Method" serves as the basis for a production scheduler and metascheduler suitable for making travel arrangements for a grid.  ...  Making reservations for distributed grid resources can be done in a similar way: resources can be scheduled sequentially for jobs requiring staged multiple resources or in parallel for co-scheduled resources  ... 
doi:10.1007/11605300_7 fatcat:6ki6htnexjgjximkvrkffc4icu

Strategies for Democratization of Supercomputing: Availability, Accessibility and Usability of High Performance Computing for Education and Practice of Big Data Analytics [article]

Jim Samuel, Margaret Brennan-Tonetta, Yana Samuel, Pradeep Subedi, Jack Smith
2021 arXiv   pre-print
The second contribution is a set of principles for HPC adoption based on an experiential narrative of HPC usage for textual analytics and NLP of social media data from a first time user perspective.  ...  There has been an increasing interest in and growing need for high performance computing (HPC), popularly known as supercomputing, in domains such as textual analytics, business domains analytics, forecasting  ...  , leaving a huge gap in education needs for implementing effective HPC usability strategies.  ... 
arXiv:2104.09091v1 fatcat:vktzjr4g6zgo7pea7ymcrghysu

Hybrid Workload Scheduling on HPC Systems [article]

Yuping Fan and Paul Rich and William Allcock and Michael Papka and Zhiling Lan
2021 arXiv   pre-print
In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system.  ...  for malleable jobs, and the performance of rigid applications.  ...  ACKNOWLEDGMENT This work is supported in part by US National Science Foundation grants CNS-1717763, CCF-1618776 and U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357.  ... 
arXiv:2109.05412v1 fatcat:24pfsjvjeze4bnamlzur6el5vq

Container orchestration on HPC systems through Kubernetes

Naweiluo Zhou, Yiannis Georgiou, Marcin Pospieszny, Li Zhong, Huan Zhou, Christoph Niethammer, Branislav Pejak, Oskar Marko, Dennis Hoppe
2021 Journal of Cloud Computing: Advances, Systems and Applications  
We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where container orchestration is performed on two levels.  ...  Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters.  ...  Joseph Schuchart for proof-reading the contents. 1  ... 
doi:10.1186/s13677-021-00231-z fatcat:vd5ziq5rtzecnennzojmcyerau
« Previous Showing results 1 — 15 out of 1,666 results