A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Asynchrony begets Momentum, with an Application to Deep Learning
[article]
2016
arXiv
pre-print
Our result does not assume convexity of the objective function, so it is applicable to deep learning systems. ...
We observe that a standard queuing model of asynchrony results in a form of momentum that is commonly used by deep learning practitioners. ...
ACKNOWLEDGMENTS The authors would like to thank Chris De Sa for the discussion on asynchrony that was the precursor to the present work and Dan Iter for his thoughtful feedback. ...
arXiv:1605.09774v2
fatcat:qsyp6ch6x5ecnnp4h5usfxbzr4
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
[article]
2020
arXiv
pre-print
We find that the degree of delay interacts with the learning rate, to change the set of minima accessible by an asynchronous stochastic gradient descent algorithm. ...
Specifically, for high delay values, we find that the learning rate should be kept inversely proportional to the delay. We then extend this analysis to include momentum. ...
Asynchrony begets momentum,
with an application to deep learning. 54th Annual Allerton Conference on Communication, Control,
and Computing, Allerton 2016, pp. 997-1004, 2017. ...
arXiv:1909.12340v2
fatcat:7yhgzksjyvg5zbi7jwx7g33fym
Taming Momentum in a Distributed Asynchronous Environment
[article]
2020
arXiv
pre-print
We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters. ...
Thereby, we show for the first time that momentum can be fully incorporated in asynchronous training with almost no ramifications to final accuracy. ...
Asynchrony begets momentum, with an application to deep learning. In 54th Annual Allerton Conference on Communication, Control, and Computing, 2016. ...
arXiv:1907.11612v3
fatcat:t5m4drjfzbbfrauj4ktqbohm64
Quasi-hyperbolic momentum and Adam for deep learning
[article]
2019
arXiv
pre-print
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. ...
We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. ...
Asynchrony begets momentum, with an application to deep learning. ...
arXiv:1810.06801v4
fatcat:tq3iul7mdnhjhjjtq5d7edjacm
Adaptive Braking for Mitigating Gradient Delay
[article]
2020
arXiv
pre-print
We show that applying AB on top of SGD with momentum enables training ResNets on CIFAR-10 and ImageNet-1k with delays D ≥ 32 update steps with minimal drop in final test accuracy. ...
Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel. ...
Asynchrony begets momentum, with an application to deep learning. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 997-1004. IEEE, 2016. ...
arXiv:2007.01397v2
fatcat:wglfxomtn5hjpm67zjlkfa2m7q
Gap Aware Mitigation of Gradient Staleness
[article]
2020
arXiv
pre-print
Cloud computing is becoming increasingly popular as a platform for distributed training of deep neural networks. ...
Despite prior beliefs, we show that if GA is applied, momentum becomes beneficial in asynchronous environments, even when the number of workers scales up. ...
Asynchrony begets momentum, with an application to deep learning. In 54th Annual Allerton Conference on Communication, Control, and Computing, pp. 997-1004, 2016. Christopher J. ...
arXiv:1909.10802v3
fatcat:lhp4kxt2p5buvayx6h5yvvaa7e
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
[article]
2018
arXiv
pre-print
In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized ...
Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting. ...
Machine learning, 8(3/4):69-97, 1992.
Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, and Christopher Ré. Asynchrony begets momentum,
with an application to deep learning. ...
arXiv:1704.04651v2
fatcat:46wvnqppnfc4pbownydwdsv3cy
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
[article]
2019
arXiv
pre-print
Moreover, this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence with low ...
Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed ...
This work was partially supported by the CMU Dean's fellowship and an IBM Faculty Award. ...
arXiv:1808.07576v3
fatcat:vjnt3h7ue5d55brdwdfpivhs34
Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls
[article]
2019
arXiv
pre-print
Large-scale machine learning training suffers from two prior challenges, specifically for nuclear-norm constrained problems with distributed systems: the synchronization slowdown due to the straggling ...
We implement our algorithm in python (with MPI) to run on Amazon EC2, and demonstrate that SFW-asyn yields speed-ups almost linear to the number of machines compared to the vanilla SFW. ...
Asynchrony begets momentum, with an application to deep learning. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 997-1004. IEEE. ...
arXiv:1910.07703v1
fatcat:kaxmvnegunak3lr2l7smmnglzi
Learning with Staleness
2018
This thesis characterizes learning with staleness from three directions: (1) We extend the theoretical analyses of a number of classical ML algorithms, including stochastic gradient descent, proximal gradient ...
As a result, concurrent ML computation in the distributed settings often needs to handle delayed updates and perform learning in the presence of staleness. ...
If the explicit momentum is zero, then the implicit momentum from asynchrony is likely higher than optimal, and thus they can reduce asynchrony to improve convergence. ...
doi:10.1184/r1/6720416
fatcat:zzaqdzyaejfh5jk42n5hti5vnu
INTERNET AND NEW TECHNOLOGIES REFLECTED IN RHYMING SLANG
2016
Russian Linguistic Bulletin
Being a developing entity, Rh Sl cannot but react to the changes that take place in modern society, evolving new tendencies and being enriched with new items that need to be linguistically interpreted ...
The article deals with new rhyming slang (Rh Sl) items that name new technological achievements and new means of communication that are connected with computerization and technical renovation. ...
John Goldsmith [89] notes that a multi-level representation provides a solution to the conceptual problems raised by the feature asynchrony in connection with the matrix formalism. ...
doi:10.18454/rulb.7.26
fatcat:7cjwxqjiebcidpc3cxzna6l6v4
American literary environmentalism
2000
ChoiceReviews
., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. ...
Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ...
Instead she replaces that logical temporality with an asynchrony in which a season is compressed into a night and the events of years past can be said to have occurred just "the other day." ...
doi:10.5860/choice.38-2025
fatcat:cs5qppm4erckxlwdowjc4iy6r4
The thinking heart: the lived experience of older first time mothers
2003
Where do mothers learn to be mothers? Women leam to be mothers from their own experience of being mothers. ...
(Bergum, 1997, p. 23) Telling the story of being an older first time mother parallels a process that is natural to most who are mothers and is integral to the practice of nurses and of others who listen ...
to beget, bring forth; a mother or a father, or by extension, an ancestor (Compact edition o f the Oxford English Dictionary [OED], 1971) . ...
doi:10.7939/r3-2cnv-gy72
fatcat:4yotapvlojf45h2pxbxisq252e
ZMK Zeitschrift für Medien- und Kulturforschung. Focus Producing Places
2022
Are there places of and for objects and operations that do not share anything with other entities, which are unable to inhabit the same place? ...
, affect each other, or attach to each other. ...
beget children in toil and pain. ...
doi:10.25969/mediarep/18766
fatcat:y3sb2r23ynhhfei3ce3fbnisk4