A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems
2015
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15
This paper develops performance models that quantify the impact of these partitioning and provisioning decisions on overall distributed system performance and scalability. ...
We evaluate our performance models and scalability optimizer using a state-of-the-art distributed DNN training framework on two benchmark applications. ...
CONCLUSION This paper develops performance models for estimating the scalability of distributed deep learning training, and for driving a scalability optimizer that efficiently determines the optimal system ...
doi:10.1145/2783258.2783270
dblp:conf/kdd/YanRHC15
fatcat:l2xojoi55bgepepkn33hfbbln4
Scalable Learning Paradigms for Data-Driven Wireless Communication
[article]
2020
arXiv
pre-print
On the other hand, we discuss the learning algorithms and model training strategies performed at each individual node from a local perspective. ...
The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy. ...
This also necessitates the use of scalable models to decompose a large optimization problem into smaller pieces to be handled in a distributed manner. ...
arXiv:2003.00474v1
fatcat:kd6plphwgbfvbdyylcn4jk24uq
2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32
2022
IEEE Transactions on Parallel and Distributed Systems
Tran-Dang, H., +, TPDS Oct. 2021 2491-2508 Modeling and Optimization of Performance and Cost of Serverless Applications. ...
., +, TPDS June 2021 1465-1478 A Resource and Performance Optimization Reduction Circuit on FPGAs. Accelerating End-to-End Deep Learning Workflow With Codesign of Data Preprocessing and Scheduling. ...
doi:10.1109/tpds.2021.3107121
fatcat:e7bh2xssazdrjcpgn64mqh4hb4
IEEE Access Special Section Editorial: Scalable Deep Learning for Big Data
2020
IEEE Access
using scalable deep learning. ...
Deep learning (DL) has emerged as a key application exploiting the increasing computational power in systems such as GPUs, multicore processors, Systems-on-Chip (SoC), and distributed clusters. ...
doi:10.1109/access.2020.3041166
fatcat:zkzdnzk22jge3l5mwju3j42mcu
SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning
[article]
2019
arXiv
pre-print
We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL). ...
The learning performances of our distributed algorithms establish new state-of-the-art on OpenAI Gym and Robotics Suites tasks. ...
Acknowledgements We would like to thank many members of the Stanford People, AI & Robots (PAIR) group in using SURREAL in their research and providing insightful feedback. ...
arXiv:1909.12989v2
fatcat:l7w77vhs4nfu7fjhw3hdn5iksy
Introduction to the Special Issue on Intelligence on Scalable computing for Recent Applications
2020
Scalable Computing : Practice and Experience
in improving the ability of parallel and distributed computer systems, intelligent techniques, and deep learning mechanisms and advanced soft computing techniques. ...
The proposed placement algorithm considers CPU, Memory, and combination of CPU-Memory utilization of VMs on the source host.In "Bird Swarm Optimization-based stacked autoencoder deep learning for umpire ...
in improving the ability of parallel and distributed computer systems, intelligent techniques, and deep learning mechanisms and advanced soft computing techniques. ...
doi:10.12694/scpe.v21i2.1581
fatcat:5ynzph7uyjektl2xv4ddkq5h2y
RLlib: Abstractions for Distributed Reinforcement Learning
[article]
2018
arXiv
pre-print
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. ...
These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/. ...
, CapitalOne, Ericsson, Facebook, Google, Huawei, Intel, Microsoft, Scotiabank, Splunk and VMware. ...
arXiv:1712.09381v4
fatcat:ihhwdewi4bfndags5x5c65mfaa
Parallel and Distributed Machine Learning Algorithms for Scalable Big Data Analytics
2019
Future generations computer systems
Acknowledgments We would like to thank the general chair of ParLearning 2017, Anand Panangadan (California State University, Fullerton, USA), and all the reviewers of this special issue. ...
The paper ''Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing'' [4] provides a performance and power analysis of important Deep Learning workloads on two major parallel architectures ...
, Hadoop), and Deep Learning (TensorFlow, PyTorch, Caffe2, and many others). ...
doi:10.1016/j.future.2019.07.009
fatcat:vwdj6ti5nfdn3hzmb5rzmwglte
Table of Contents
2021
2021 IEEE 46th Conference on Local Computer Networks (LCN)
and Resource Allocation for FANETs with Deep Reinforcement Learning 315 A Machine Learning Approach to Peer Connectivity Estimation for Reliable Blockchain Networking 319 Optimal Placement of Recurrent ...
Migration Optimization in MEC by Deep Reinforcement Learning Strategy
411
Alternative Authentication with Smart Contracts for Online Games
415
A Multidimensional Trust Model for Vehicular Ad-Hoc ...
doi:10.1109/lcn52139.2021.9524933
fatcat:bopsc4l2qrc7bobzfyb6343iou
A Quantitative Survey of Communication Optimizations in Distributed Deep Learning
[article]
2020
arXiv
pre-print
Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling ...
We present the state-of-the-art communication optimization techniques and conduct a comparative study of seven common lossless distributed DL methods on a 32-GPU cluster with 100Gbps InfiniBand (IB). ...
ACKNOWLEDGMENTS The research was supported in part by Hong Kong RGC GRF grants under the contracts HKBU 12200418, HKUST 16206417 and 16207818, and in part by National Natural Science Foundation of China ...
arXiv:2005.13247v2
fatcat:xrl7lofxubaqbn72pfzdpf7vea
Digital-Twin-Enabled 6G: Vision, Architectural Trends, and Future Directions
[article]
2021
arXiv
pre-print
Digital twins use a virtual representation of the 6G physical system along with the associated algorithms (e.g., machine learning, optimization), communication technologies (e.g., millimeter-wave and terahertz ...
, reliability, data rate, and user-defined performance metrics. ...
To address these challenges, we can use distributed deep learning-based twin models. ...
arXiv:2102.12169v2
fatcat:bb4x22q2djf6bdvtlvy3wx46z4
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
[article]
2018
arXiv
pre-print
To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that significantly improves ...
Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. ...
Figure 1 is an overview of our distributed deep learning training system. ...
arXiv:1807.11205v1
fatcat:z5mkpryrejamnfoaszclsrcqnu
2^1296 Exponentially Complex Quantum Many-Body Simulation via Scalable Deep Learning Method
[article]
2022
arXiv
pre-print
We report that a deep learning based simulation protocol can achieve the solution with state-of-the-art precision in the Hilbert space as large as 2^1296 for spin system and 3^144 for fermion system , ...
The accomplishment of this work opens the door to simulate spin models and Fermion models on unprecedented lattice size with extreme high precision. ...
Meanwhile, the traditional high performance algebra libraries, like BLAS, Lapack, and the distributed ScaLapack [10] , are also optimized. ...
arXiv:2204.07816v1
fatcat:q22su3i2hjgyreogibjgb6gdh4
Editorial for the special issue on operating systems and programming systems for HPC
2020
CCF Transactions on High Performance Computing
of deep learning and HPC system. ...
We have four papers that discuss programming system innovations covering traditional HPC applications and deep learning area, tackling inter-node parallel scalability and intra-node processor heterogeneity ...
We have four papers that discuss programming system innovations covering traditional HPC applications and deep learning area, tackling inter-node parallel scalability and intra-node processor heterogeneity ...
doi:10.1007/s42514-020-00053-6
fatcat:nthaiyn6m5eqvisdxwiz7r7u2m
Challenges and Opportunities in Approximate Bayesian Deep Learning for Intelligent IoT Systems
[article]
2021
arXiv
pre-print
of over-confident errors and providing enhanced robustness to out of distribution examples. ...
We highlight several potential solutions to decreasing model storage requirements and improving computational scalability, including model pruning and distillation methods. ...
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory ...
arXiv:2112.01675v1
fatcat:okknsw5gifhl7ghzt4cnuchuje
« Previous
Showing results 1 — 15 out of 44,993 results