Filters








44,993 Hits in 3.5 sec

Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems

Feng Yan, Olatunji Ruwase, Yuxiong He, Trishul Chilimbi
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
This paper develops performance models that quantify the impact of these partitioning and provisioning decisions on overall distributed system performance and scalability.  ...  We evaluate our performance models and scalability optimizer using a state-of-the-art distributed DNN training framework on two benchmark applications.  ...  CONCLUSION This paper develops performance models for estimating the scalability of distributed deep learning training, and for driving a scalability optimizer that efficiently determines the optimal system  ... 
doi:10.1145/2783258.2783270 dblp:conf/kdd/YanRHC15 fatcat:l2xojoi55bgepepkn33hfbbln4

Scalable Learning Paradigms for Data-Driven Wireless Communication [article]

Yue Xu, Feng Yin, Wenjun Xu, Chia-Han Lee, Jiaru Lin, Shuguang Cui
2020 arXiv   pre-print
On the other hand, we discuss the learning algorithms and model training strategies performed at each individual node from a local perspective.  ...  The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy.  ...  This also necessitates the use of scalable models to decompose a large optimization problem into smaller pieces to be handled in a distributed manner.  ... 
arXiv:2003.00474v1 fatcat:kd6plphwgbfvbdyylcn4jk24uq

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
Tran-Dang, H., +, TPDS Oct. 2021 2491-2508 Modeling and Optimization of Performance and Cost of Serverless Applications.  ...  ., +, TPDS June 2021 1465-1478 A Resource and Performance Optimization Reduction Circuit on FPGAs. Accelerating End-to-End Deep Learning Workflow With Codesign of Data Preprocessing and Scheduling.  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

IEEE Access Special Section Editorial: Scalable Deep Learning for Big Data

Liangxiu Han, Daoqiang Zhang, Omer Rana, Yi Pan, Sohail Jabbar, Mazin Yousif, Moayad Aloqaily
2020 IEEE Access  
using scalable deep learning.  ...  Deep learning (DL) has emerged as a key application exploiting the increasing computational power in systems such as GPUs, multicore processors, Systems-on-Chip (SoC), and distributed clusters.  ... 
doi:10.1109/access.2020.3041166 fatcat:zkzdnzk22jge3l5mwju3j42mcu

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning [article]

Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei
2019 arXiv   pre-print
We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL).  ...  The learning performances of our distributed algorithms establish new state-of-the-art on OpenAI Gym and Robotics Suites tasks.  ...  Acknowledgements We would like to thank many members of the Stanford People, AI & Robots (PAIR) group in using SURREAL in their research and providing insightful feedback.  ... 
arXiv:1909.12989v2 fatcat:l7w77vhs4nfu7fjhw3hdn5iksy

Introduction to the Special Issue on Intelligence on Scalable computing for Recent Applications

Vijaya P, Binu D
2020 Scalable Computing : Practice and Experience  
in improving the ability of parallel and distributed computer systems, intelligent techniques, and deep learning mechanisms and advanced soft computing techniques.  ...  The proposed placement algorithm considers CPU, Memory, and combination of CPU-Memory utilization of VMs on the source host.In "Bird Swarm Optimization-based stacked autoencoder deep learning for umpire  ...  in improving the ability of parallel and distributed computer systems, intelligent techniques, and deep learning mechanisms and advanced soft computing techniques.  ... 
doi:10.12694/scpe.v21i2.1581 fatcat:5ynzph7uyjektl2xv4ddkq5h2y

RLlib: Abstractions for Distributed Reinforcement Learning [article]

Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica
2018 arXiv   pre-print
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation.  ...  These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.  ...  , CapitalOne, Ericsson, Facebook, Google, Huawei, Intel, Microsoft, Scotiabank, Splunk and VMware.  ... 
arXiv:1712.09381v4 fatcat:ihhwdewi4bfndags5x5c65mfaa

Parallel and Distributed Machine Learning Algorithms for Scalable Big Data Analytics

Henri Bal, Arindam Pal
2019 Future generations computer systems  
Acknowledgments We would like to thank the general chair of ParLearning 2017, Anand Panangadan (California State University, Fullerton, USA), and all the reviewers of this special issue.  ...  The paper ''Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing'' [4] provides a performance and power analysis of important Deep Learning workloads on two major parallel architectures  ...  , Hadoop), and Deep Learning (TensorFlow, PyTorch, Caffe2, and many others).  ... 
doi:10.1016/j.future.2019.07.009 fatcat:vwdj6ti5nfdn3hzmb5rzmwglte

Table of Contents

2021 2021 IEEE 46th Conference on Local Computer Networks (LCN)  
and Resource Allocation for FANETs with Deep Reinforcement Learning 315 A Machine Learning Approach to Peer Connectivity Estimation for Reliable Blockchain Networking 319 Optimal Placement of Recurrent  ...  Migration Optimization in MEC by Deep Reinforcement Learning Strategy 411 Alternative Authentication with Smart Contracts for Online Games 415 A Multidimensional Trust Model for Vehicular Ad-Hoc  ... 
doi:10.1109/lcn52139.2021.9524933 fatcat:bopsc4l2qrc7bobzfyb6343iou

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning [article]

Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li
2020 arXiv   pre-print
Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling  ...  We present the state-of-the-art communication optimization techniques and conduct a comparative study of seven common lossless distributed DL methods on a 32-GPU cluster with 100Gbps InfiniBand (IB).  ...  ACKNOWLEDGMENTS The research was supported in part by Hong Kong RGC GRF grants under the contracts HKBU 12200418, HKUST 16206417 and 16207818, and in part by National Natural Science Foundation of China  ... 
arXiv:2005.13247v2 fatcat:xrl7lofxubaqbn72pfzdpf7vea

Digital-Twin-Enabled 6G: Vision, Architectural Trends, and Future Directions [article]

Latif U. Khan, Walid Saad, Dusit Niyato, Zhu Han, Choong Seon Hong
2021 arXiv   pre-print
Digital twins use a virtual representation of the 6G physical system along with the associated algorithms (e.g., machine learning, optimization), communication technologies (e.g., millimeter-wave and terahertz  ...  , reliability, data rate, and user-defined performance metrics.  ...  To address these challenges, we can use distributed deep learning-based twin models.  ... 
arXiv:2102.12169v2 fatcat:bb4x22q2djf6bdvtlvy3wx46z4

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes [article]

Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi (+1 others)
2018 arXiv   pre-print
To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that significantly improves  ...  Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models.  ...  Figure 1 is an overview of our distributed deep learning training system.  ... 
arXiv:1807.11205v1 fatcat:z5mkpryrejamnfoaszclsrcqnu

2^1296 Exponentially Complex Quantum Many-Body Simulation via Scalable Deep Learning Method [article]

Xiao Liang, Mingfan Li, Qian Xiao, Hong An, Lixin He, Xuncheng Zhao, Junshi Chen, Chao Yang, Fei Wang, Hong Qian, Li Shen, Dongning Jia (+3 others)
2022 arXiv   pre-print
We report that a deep learning based simulation protocol can achieve the solution with state-of-the-art precision in the Hilbert space as large as 2^1296 for spin system and 3^144 for fermion system ,  ...  The accomplishment of this work opens the door to simulate spin models and Fermion models on unprecedented lattice size with extreme high precision.  ...  Meanwhile, the traditional high performance algebra libraries, like BLAS, Lapack, and the distributed ScaLapack [10] , are also optimized.  ... 
arXiv:2204.07816v1 fatcat:q22su3i2hjgyreogibjgb6gdh4

Editorial for the special issue on operating systems and programming systems for HPC

Xiaobing Feng, Minyi Guo
2020 CCF Transactions on High Performance Computing  
of deep learning and HPC system.  ...  We have four papers that discuss programming system innovations covering traditional HPC applications and deep learning area, tackling inter-node parallel scalability and intra-node processor heterogeneity  ...  We have four papers that discuss programming system innovations covering traditional HPC applications and deep learning area, tackling inter-node parallel scalability and intra-node processor heterogeneity  ... 
doi:10.1007/s42514-020-00053-6 fatcat:nthaiyn6m5eqvisdxwiz7r7u2m

Challenges and Opportunities in Approximate Bayesian Deep Learning for Intelligent IoT Systems [article]

Meet P. Vadera, Benjamin M. Marlin
2021 arXiv   pre-print
of over-confident errors and providing enhanced robustness to out of distribution examples.  ...  We highlight several potential solutions to decreasing model storage requirements and improving computational scalability, including model pruning and distillation methods.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory  ... 
arXiv:2112.01675v1 fatcat:okknsw5gifhl7ghzt4cnuchuje
« Previous Showing results 1 — 15 out of 44,993 results