Filters








1,180 Hits in 6.5 sec

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension [article]

Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald Goldfarb, Adrian Weller
2022 arXiv   pre-print
We consider distributed optimization under communication constraints for training deep learning models.  ...  We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD).  ...  Conclusion Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models (Supplementary Material) Abstract This Supplement presents additional details in support of the full article  ... 
arXiv:1905.10395v5 fatcat:2ygkz5ewzzbi7amb67cxq7ufju

LEASGD: an Efficient and Privacy-Preserving Decentralized Algorithm for Distributed Learning [article]

Hsin-Pai Cheng, Patrick Yu, Haojing Hu, Feng Yan, Shiyu Li, Hai Li, Yiran Chen
2018 arXiv   pre-print
Distributed learning systems have enabled training large-scale models over large amount of data in significantly shorter time.  ...  To achieve this goal, we propose a new learning algorithm LEASGD (Leader-Follower Elastic Averaging Stochastic Gradient Descent), which is driven by a novel Leader-Follower topology and a differential  ...  LEASGD adopts the insight of the Elastic Averaging Stochastic Gradient Descent (EASGD) [8] by exerting the elastic force between the leader and follower at each update.  ... 
arXiv:1811.11124v1 fatcat:hpo6g2m2vbhj7cmx4s6rj7u5pq

DiNNO: Distributed Neural Network Optimization for Multi-Robot Collaborative Learning [article]

Javier Yu, Joseph A. Vincent, Mac Schwager
2021 arXiv   pre-print
We present a distributed algorithm that enables a group of robots to collaboratively optimize the parameters of a deep neural network model while communicating over a mesh network.  ...  Each robot only has access to its own data and maintains its own version of the neural network, but eventually learns a model that is as good as if it had been trained on all the data centrally.  ...  In [18] the distributed subgradient descent algorithm is extended to stochastic gradients as the distributed stochastic gradient descent (DSGD) algorithm, and uses training of a CIFAR-10 classification  ... 
arXiv:2109.08665v1 fatcat:tfmyej6rmbgmzpv2fpkcfvg73i

Image classification model based on spark and CNN

Jiangfeng Xu, Shenyue Ma, Nader Asnafi
2018 MATEC Web of Conferences  
Literature [9] did the distributed training by GPU, and the literature [10] used distributed asynchronous stochastic gradient descent to train the depth network, both of them have the effect of training  ...  The model first improves the initialization mode of convolution kernel parameters, then eliminates the redundancy of feature maps, and finally optimizes the distributed gradient descent by reducing the  ...  In the research of stochastic gradient descent for distributed cluster system, Stefan GM [8] and others used MapReduce parallel model to realize the parallelism of stochastic gradient descent algorithm  ... 
doi:10.1051/matecconf/201818903012 fatcat:xkzaszgq2rbwdikatq5wgiwx5y

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks [article]

Karanbir Chahal, Manraj Singh Grover, Kuntal Dey
2018 arXiv   pre-print
Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models.  ...  More specifically, we explore the synchronous and asynchronous variants of distributed Stochastic Gradient Descent, various All Reduce gradient aggregation strategies and best practices for obtaining higher  ...  Distributed Training Algorithms A popular algorithm used for training in the distributed setting is the Stochastic Gradient Descent (SGD), this algorithm shall be the focal point in our discussion going  ... 
arXiv:1810.11787v1 fatcat:wy36x3sdwvhvfdrnc5tvzn7sty

Multi-Agent Reinforcement Learning for Joint Channel Assignment and Power Allocation in Platoon-Based C-V2X Systems [article]

Hung V. Vu, Zheyu Liu, Duy H. N. Nguyen, Robert Morawski, Tho Le-Ngoc
2020 arXiv   pre-print
Toward this end, we utilize the double deep Q-learning algorithm to jointly train the agents under the objectives of simultaneously maximizing the V2I sum-rate and satisfying the packet delivery probability  ...  Utilizing a reinforcement learning (RL) approach, we propose a distributed resource allocation (RA) algorithm to overcome this challenge.  ...  gradient descent algorithm to train the deep Q-networks.  ... 
arXiv:2011.04555v1 fatcat:iftdr27hofbrbfyiuxzorp3vfq

Consumer price index prediction using Long Short Term Memory (LSTM) based cloud computing

S Zahara, Sugianto, M B Ilmiddaviq
2020 Journal of Physics, Conference Series  
Stochastic Gradient Descent (sgd), Root Mean Square Propagation (RMSProp), Adaptive Gradient (AdaGrad), Adaptive moment (Adam), Adadelta, Nesterov Adam (Nadam) and Adamax.  ...  As part of machine learning networks, LSTM also notable as the right choice for time-series prediction. Inflation rate has been used for decision making for central banks also private sector.  ...  In the process of building prediction model, the distribution of training data and testing data is 70:30.  ... 
doi:10.1088/1742-6596/1456/1/012022 fatcat:vyiulcqoyvek5l3xdmoe2da474

Training GANs with Optimism [article]

Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, Haoyang Zeng
2018 arXiv   pre-print
We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants.  ...  We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs.  ...  PRELIMINARIES: WGANS AND OPTIMISTIC MIRROR DESCENT We consider the problem of learning a generative model of a distribution of data points Q ∈ ∆(X).  ... 
arXiv:1711.00141v2 fatcat:qjenlkdlibgodeavekxrllhlgy

A Deep Recurrent Q Network towards Self-adapting Distributed Microservices architecture [article]

Basel Magableh
2019 arXiv   pre-print
The performance of DRQN is evaluated against deep Q-learning and policy gradient algorithms including: i) deep q-network (DQN), ii) dulling deep Q-network (DDQN), iii) a policy gradient neural network  ...  To achieve the desired high levels of self-adaptability, this research implements the distributed microservices architectures model, as informed by the MAPE-K model.  ...  Q-value by defining approximation function and train the model in deep Q-networks; an approach is called deep Q-learning.  ... 
arXiv:1901.04011v2 fatcat:ns44bdnlpvgxhcjddwk4xkpahu

A Comprehensive Overview and Survey of Recent Advances in Meta-Learning [article]

Huimin Peng
2020 arXiv   pre-print
Deep learning is focused upon in-sample prediction and meta-learning concerns model adaptation for out-of-sample prediction.  ...  Meta-learning seeks adaptation of machine learning models to unseen tasks which are vastly different from trained tasks.  ...  Acknowledgment Thanks to Debasmit Das, Louis Kirsch and Luca Bertinetto (in alphabetical order) for useful and valuable comments on this manuscript.  ... 
arXiv:2004.11149v7 fatcat:ko266mr26jar3pyn6t4r3l5drm

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding [article]

Alex Kendall and Vijay Badrinarayanan and Roberto Cipolla
2016 arXiv   pre-print
We present a deep learning framework for probabilistic pixel-wise semantic segmentation, which we term Bayesian SegNet.  ...  We achieve this by Monte Carlo sampling with dropout at test time to generate a posterior distribution of pixel class labels.  ...  Therefore training the network with stochastic gradient descent will encourage the model to learn a distribution of weights which explains the data well while preventing over-fitting.  ... 
arXiv:1511.02680v2 fatcat:2rc2pxlzm5ddlkksy7cmyhfoam

DeepTox: Toxicity Prediction using Deep Learning

Andreas Mayr, Günter Klambauer, Thomas Unterthiner, Sepp Hochreiter
2016 Frontiers in Environmental Science  
In its next step, DeepTox trains models, evaluates them, and combines the best of them to ensembles. Finally, DeepTox predicts the toxicity of new compounds.  ...  In order to utilize Deep Learning for toxicity prediction, we have developed the DeepTox pipeline. First, DeepTox normalizes the chemical representations of the compounds.  ...  DeepTox uses stochastic gradient descent learning to train the DNNs (see Section 2.2.3), employing mini-batches of 512 samples.  ... 
doi:10.3389/fenvs.2015.00080 fatcat:f3uxqnexobf43jg5q2z6rn7pqq

DeepTox: Toxicity prediction using deep learning

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
2017 Toxicology Letters  
In its next step, DeepTox trains models, evaluates them, and combines the best of them to ensembles. Finally, DeepTox predicts the toxicity of new compounds.  ...  In order to utilize Deep Learning for toxicity prediction, we have developed the DeepTox pipeline. First, DeepTox normalizes the chemical representations of the compounds.  ...  DeepTox uses stochastic gradient descent learning to train the DNNs (see Section 2.2.3), employing mini-batches of 512 samples.  ... 
doi:10.1016/j.toxlet.2017.07.175 fatcat:rajiojv6fjeornjxgdjmf5e5iu

Online Meta-Learning [article]

Chelsea Finn, Aravind Rajeswaran, Sham Kakade, Sergey Levine
2019 arXiv   pre-print
Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch.  ...  In contrast, online (regret based) learning considers a sequential setting in which problems are revealed one after the other, but conventionally train only a single model without any task-specific adaptation  ...  Acknowledgements Aravind Rajeswaran thanks Emo Todorov for valuable discussions on the problem formulation.  ... 
arXiv:1902.08438v4 fatcat:lch52wgsm5cf3bgt7bidvfrqsu

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach [article]

Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
2022 arXiv   pre-print
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.  ...  This formulation induces a competition between a leader and a follower, where the follower generates perturbations, and the leader trains the model subject to the perturbations.  ...  Learn- ing to learn by gradient descent by gradient descent.  ... 
arXiv:2104.04886v3 fatcat:sld3r2tbrbctfcns4jhltrtf5e
« Previous Showing results 1 — 15 out of 1,180 results