Filters








1,416 Hits in 4.8 sec

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS July 2021 1725-1739 DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.  ...  ., +, TPDS Aug. 2021 2086-2100 Learning-Driven Interference-Aware Workload Parallelization for Stream- ing Applications in Heterogeneous Cluster.  ...  Graph coloring Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU. Zheng, Z., +,  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision [article]

Wei Gao, Qinghao Hu, Zhisheng Ye, Peng Sun, Xiaolin Wang, Yingwei Luo, Tianwei Zhang, Yonggang Wen
2022 arXiv   pre-print
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure.  ...  An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization.  ...  𝐺𝑎𝑛𝑑𝑖𝑣𝑎 𝑓 𝑎𝑖𝑟 [16] is an early fairness scheduler dedicated for the heterogeneous GPU resource environment. It targets the inter-user fairness in the GPU heterogeneity.  ... 
arXiv:2205.11913v3 fatcat:fnbinueyijb4nc75fpzd6hzjgq

Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning

Dong-Ki Kang, Ki-Beom Lee, Young-Chon Kim
2022 Energies  
In this paper, we propose a cost efficient deep learning job allocation (CE-DLA) approach minimizing the energy consumption cost for the DL cluster operation while guaranteeing the performance requirements  ...  Expanding the scale of GPU-based deep learning (DL) clusters would bring not only accelerated AI services but also significant energy consumption costs.  ...  Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning.  ... 
doi:10.3390/en15020474 fatcat:chkjihqakfculjf3rwveauahmy

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training [article]

Letian Zhao, Rui Xu, Tianqi Wang, Teng Tian, Xiaotian Wang, Wei Wu, Chio-in Ieong, Xi Jin
2021 arXiv   pre-print
We have trained different DNNs such as VGG-16, ResNet-50, and GNMT on GPU clusters and simulated the performance of different FPGA clusters.  ...  To satisfy the requirement of computation and memory of DNN training, distributed deep learning based on model parallelism has been widely recognized.  ...  The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China.  ... 
arXiv:2012.12544v2 fatcat:nfgqw2i7gbatxojvwe4ka27t6y

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning [article]

Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, Eric P. Xing
2021 arXiv   pre-print
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level.  ...  Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments.  ...  Acknowledgements We thank our shepherd, Michael Isard, and the anonymous OSDI reviewers for their insightful comments and suggestions that improved our work.  ... 
arXiv:2008.12260v2 fatcat:wupzzej7crf4bek53a6scbqtli

Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge [article]

Anirban Bhattacharjee, Ajay Dev Chhokra, Hongyang Sun, Shashank Shekhar, Aniruddha Gokhale, Gabor Karsai, Abhishek Dubey
2020 arXiv   pre-print
However, efficiently utilizing the edge resources for the model update is a hard problem due to the heterogeneity among the edge devices and the resource interference caused by the co-location of the DL  ...  Deep Learning (DL) model-based AI services are increasingly offered in a variety of predictive analytics services such as computer vision, natural language processing, speech recognition.  ...  The function, EstState, has multiple outputs in nature, i.e., GPU, CPU, Memory, and a separate regressor is learned for each one of them.  ... 
arXiv:2004.05740v1 fatcat:mcc7gcdjkzef5d7cd3jbuht444

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Wonje Choi, Karthi Duraisamy, Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu
2018 IEEE transactions on computers  
In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs.  ...  It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication  ...  , to efficiently handle deep learning applications.  ... 
doi:10.1109/tc.2017.2777863 fatcat:actbfs64dbgsdct3tr4ubfqfjq

2020 Index IEEE Transactions on Parallel and Distributed Systems Vol. 31

2021 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS June 2019 1283-1297 Enabling Flexible Resource Allocation in Mobile Deep Learning Systems.  ...  ., +, TPDS Nov. 2019 2536-2546 A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.  ... 
doi:10.1109/tpds.2020.3033655 fatcat:cpeatdjlpzhqdersvsk5nmzjkm

Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning [article]

Xiaoyang Zhao, Chuan Wu
2021 arXiv   pre-print
Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance.  ...  In today's clusters containing thousands of GPU servers, running a single scheduler to manage all arrival jobs in a timely and effective manner is challenging, due to the large workload scale.  ...  Cluster schedulers are responsible for producing scheduling policies for DL workloads in such a cluster, whose decisions are crucial for efficient utilization of the very expensive hardware resources and  ... 
arXiv:2112.13354v1 fatcat:csthoe3fuffurm3c3supvznsta

BPCM: A flexible high-speed bypass parallel communication mechanism for GPU cluster

Mingjie Wu, Qingkui Chen, Jingjuan Wang
2020 IEEE Access  
.: BPCM: A Flexible High-Speed Bypass Parallel Communication Mechanism for GPU Cluster two nodes in the GPU cluster become the biggest bottleneck that limits the expansion of distributed machine learning  ...  Finally, the experimental results show that this mechanism can provide high bandwidth for GPU clusters with inexpensive multi-network cards, and provide superimposed bandwidth of multi-network cards in  ...  nodes in the GPU cluster.  ... 
doi:10.1109/access.2020.2999096 fatcat:c5k422elzfdhpkpzm677bgvhwm

How to Train Your Neural Network: A Comparative Evaluation [article]

Shu-Huai Lin, Daniel Nichols, Siddharth Singh, Abhinav Bhatele
2021 arXiv   pre-print
In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning.  ...  The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks.  ...  ThetaGPU is a GPU extension of the Cray XC40 Theta system. Vulcan is a heterogeneous GPU cluster at our institution.  ... 
arXiv:2111.04949v1 fatcat:gfjiefx24jh3bhizu4j4t5slwa

FfDL

K. R. Jayaram, Archit Verma, Falk Pollok, Rania Khalaf, Vinod Muthusamy, Parijat Dube, Vatche Ishakian, Chen Wang, Benjamin Herta, Scott Boag, Diana Arroyo, Asser Tantawi
2019 Proceedings of the 20th International Middleware Conference on - Middleware '19  
We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency.  ...  the overheads introduced by the platform for various deep learning models, the load and performance observed in a real case study using FfDL within our organization, the frequency of various faults observed  ...  Acknowledgments We would like to thank the anonymous reviewers of Middleware'19 and our shepherd Derek Murray for their insightful feedback.  ... 
doi:10.1145/3361525.3361538 dblp:conf/middleware/JayaramMDIWHBAT19 fatcat:vhwii2hpjrcbtjb4qffenmdjna

2018 Index IEEE Transactions on Computers Vol. 67

2019 IEEE transactions on computers  
., and Rodriguez-Henriquez, F., A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol; 1622-1636 Feng, D., see Fu, M., TC Sept. 2018 1259-1272 Analysis  ...  ., þ, TC May 2018 631-645 On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems.  ...  ., þ, TC April 2018 469-483 On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems.  ... 
doi:10.1109/tc.2018.2882120 fatcat:j2j7yw42hnghjoik2ghvqab6ti

A Task Execution Scheme for Dew Computing with State-of-the-Art Smartphones

Matías Hirsch, Cristian Mateos, Alejandro Zunino, Tim A. Majchrzak, Tor-Morten Grønli, Hermann Kaindl
2021 Electronics  
Using these resources could be highly beneficial in edge computing and fog computing contexts, for example, to support urban services for citizens.  ...  Smartphones may form ad hoc networks, but individual devices highly differ in computational capabilities and (tolerable) energy usage.  ...  Nowadays, it is not surprising to find affordable smartphones on the market with eight-core processors and GPUs capable of running deep learning frameworks such as Tensorflow (https://www. tensorflow.org  ... 
doi:10.3390/electronics10162006 fatcat:yecxg4b6ejh3zgwxcxrb4vu42a

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Sebastian Raschka, Joshua Patterson, Corey Nolet
2020 Information  
Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence  ...  Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and  ...  GPUs changed the landscape of classical ML and deep learning.  ... 
doi:10.3390/info11040193 fatcat:hetp7ngcpbbcpkhdcyowuiiwxe
« Previous Showing results 1 — 15 out of 1,416 results