A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32
2022
IEEE Transactions on Parallel and Distributed Systems
., +, TPDS July 2021 1725-1739 DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters. ...
., +, TPDS Aug. 2021 2086-2100
Learning-Driven Interference-Aware Workload Parallelization for Stream-
ing Applications in Heterogeneous Cluster. ...
Graph coloring Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU. Zheng, Z., +, ...
doi:10.1109/tpds.2021.3107121
fatcat:e7bh2xssazdrjcpgn64mqh4hb4
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
[article]
2022
arXiv
pre-print
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. ...
An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization. ...
𝐺𝑎𝑛𝑑𝑖𝑣𝑎 𝑓 𝑎𝑖𝑟 [16] is an early fairness scheduler dedicated for the heterogeneous GPU resource environment. It targets the inter-user fairness in the GPU heterogeneity. ...
arXiv:2205.11913v3
fatcat:fnbinueyijb4nc75fpzd6hzjgq
Cost Efficient GPU Cluster Management for Training and Inference of Deep Learning
2022
Energies
In this paper, we propose a cost efficient deep learning job allocation (CE-DLA) approach minimizing the energy consumption cost for the DL cluster operation while guaranteeing the performance requirements ...
Expanding the scale of GPU-based deep learning (DL) clusters would bring not only accelerated AI services but also significant energy consumption costs. ...
Balancing efficiency and fairness in heterogeneous GPU
clusters for deep learning. ...
doi:10.3390/en15020474
fatcat:chkjihqakfculjf3rwveauahmy
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training
[article]
2021
arXiv
pre-print
We have trained different DNNs such as VGG-16, ResNet-50, and GNMT on GPU clusters and simulated the performance of different FPGA clusters. ...
To satisfy the requirement of computation and memory of DNN training, distributed deep learning based on model parallelism has been widely recognized. ...
The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of University of Science and Technology of China. ...
arXiv:2012.12544v2
fatcat:nfgqw2i7gbatxojvwe4ka27t6y
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
[article]
2021
arXiv
pre-print
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. ...
Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments. ...
Acknowledgements We thank our shepherd, Michael Isard, and the anonymous OSDI reviewers for their insightful comments and suggestions that improved our work. ...
arXiv:2008.12260v2
fatcat:wupzzej7crf4bek53a6scbqtli
Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge
[article]
2020
arXiv
pre-print
However, efficiently utilizing the edge resources for the model update is a hard problem due to the heterogeneity among the edge devices and the resource interference caused by the co-location of the DL ...
Deep Learning (DL) model-based AI services are increasingly offered in a variety of predictive analytics services such as computer vision, natural language processing, speech recognition. ...
The function, EstState, has multiple outputs in nature, i.e., GPU, CPU, Memory, and a separate regressor is learned for each one of them. ...
arXiv:2004.05740v1
fatcat:mcc7gcdjkzef5d7cd3jbuht444
On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems
2018
IEEE transactions on computers
In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. ...
It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication ...
, to efficiently handle deep learning applications. ...
doi:10.1109/tc.2017.2777863
fatcat:actbfs64dbgsdct3tr4ubfqfjq
2020 Index IEEE Transactions on Parallel and Distributed Systems Vol. 31
2021
IEEE Transactions on Parallel and Distributed Systems
., +, TPDS June 2019 1283-1297 Enabling Flexible Resource Allocation in Mobile Deep Learning Systems. ...
., +, TPDS Nov. 2019 2536-2546 A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines. ...
doi:10.1109/tpds.2020.3033655
fatcat:cpeatdjlpzhqdersvsk5nmzjkm
Large-scale Machine Learning Cluster Scheduling via Multi-agent Graph Reinforcement Learning
[article]
2021
arXiv
pre-print
Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance. ...
In today's clusters containing thousands of GPU servers, running a single scheduler to manage all arrival jobs in a timely and effective manner is challenging, due to the large workload scale. ...
Cluster schedulers are responsible for producing scheduling policies for DL workloads in such a cluster, whose decisions are crucial for efficient utilization of the very expensive hardware resources and ...
arXiv:2112.13354v1
fatcat:csthoe3fuffurm3c3supvznsta
BPCM: A flexible high-speed bypass parallel communication mechanism for GPU cluster
2020
IEEE Access
.: BPCM: A Flexible High-Speed Bypass Parallel Communication Mechanism for GPU Cluster two nodes in the GPU cluster become the biggest bottleneck that limits the expansion of distributed machine learning ...
Finally, the experimental results show that this mechanism can provide high bandwidth for GPU clusters with inexpensive multi-network cards, and provide superimposed bandwidth of multi-network cards in ...
nodes in the GPU cluster. ...
doi:10.1109/access.2020.2999096
fatcat:c5k422elzfdhpkpzm677bgvhwm
How to Train Your Neural Network: A Comparative Evaluation
[article]
2021
arXiv
pre-print
In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. ...
The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. ...
ThetaGPU is a GPU extension of the Cray XC40 Theta system. Vulcan is a heterogeneous GPU cluster at our institution. ...
arXiv:2111.04949v1
fatcat:gfjiefx24jh3bhizu4j4t5slwa
We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency. ...
the overheads introduced by the platform for various deep learning models, the load and performance observed in a real case study using FfDL within our organization, the frequency of various faults observed ...
Acknowledgments We would like to thank the anonymous reviewers of Middleware'19 and our shepherd Derek Murray for their insightful feedback. ...
doi:10.1145/3361525.3361538
dblp:conf/middleware/JayaramMDIWHBAT19
fatcat:vhwii2hpjrcbtjb4qffenmdjna
2018 Index IEEE Transactions on Computers Vol. 67
2019
IEEE transactions on computers
., and Rodriguez-Henriquez, F., A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol; 1622-1636 Feng, D., see Fu, M., TC Sept. 2018 1259-1272 Analysis ...
., þ, TC May 2018 631-645 On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems. ...
., þ, TC April 2018 469-483 On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems. ...
doi:10.1109/tc.2018.2882120
fatcat:j2j7yw42hnghjoik2ghvqab6ti
A Task Execution Scheme for Dew Computing with State-of-the-Art Smartphones
2021
Electronics
Using these resources could be highly beneficial in edge computing and fog computing contexts, for example, to support urban services for citizens. ...
Smartphones may form ad hoc networks, but individual devices highly differ in computational capabilities and (tolerable) energy usage. ...
Nowadays, it is not surprising to find affordable smartphones on the market with eight-core processors and GPUs capable of running deep learning frameworks such as Tensorflow (https://www. tensorflow.org ...
doi:10.3390/electronics10162006
fatcat:yecxg4b6ejh3zgwxcxrb4vu42a
Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence
2020
Information
Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence ...
Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and ...
GPUs changed the landscape of classical ML and deep learning. ...
doi:10.3390/info11040193
fatcat:hetp7ngcpbbcpkhdcyowuiiwxe
« Previous
Showing results 1 — 15 out of 1,416 results