6,862 Hits in 5.0 sec

On the insufficiency of existing momentum schemes for Stochastic Optimization [article]

Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
2018 arXiv   pre-print
In the stochastic case, the popular explanations for their wide applicability is that when these fast gradient methods are applied in the stochastic case, they partially mimic their exact gradient counterparts  ...  Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning  ...  Acknowledgments Sham Kakade acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery and the NSF through awards CCF-1637360, CCF-1703574 and CCF-1740551.  ... 
arXiv:1803.05591v2 fatcat:tisn3budvzfbdipxu7oqbi73je

Page 3190 of Mathematical Reviews Vol. , Issue 2004d [page]

2004 Mathematical Reviews  
Milstein scheme for stochastic differential equations.  ...  Then, the authors deduce some existence results for 4, and some relationships such as comparison of the Hausdorff mea- sures of Mo and MW.  ... 

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
2021 Stochastic Systems  
To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima.  ...  Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing).  ...  The follow-up work (Liu et al. 2018) has been published in the Proceedings of the Thirty-Second Conference on Neural Information Processing Systems.  ... 
doi:10.1287/stsy.2021.0083 fatcat:jk4p2hk6rzal7ninpvzcpgwyg4

Explicitly Stochastic Parameterization of Nonorographic Gravity Wave Drag

Stephen D. Eckermann
2011 Journal of the Atmospheric Sciences  
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining  ...  the data needed, and completing and reviewing the collection of information.  ...  This research was supported by the Office of Naval Research through the NRL 6.1 work unit "Subgridscale Dynamics of Middle and Upper Atmospheres" and by NASA's Global Modeleing and Analysis Program, contract  ... 
doi:10.1175/2011jas3684.1 fatcat:g4cezl5uq5ao7msxhe673yl55y

On the Generalization Benefit of Noise in Stochastic Gradient Descent [article]

Samuel L. Smith, Erich Elsen, Soham De
2020 arXiv   pre-print
We study how the optimal learning rate schedule changes as the epoch budget grows, and we provide a theoretical account of our observations based on the stochastic differential equation perspective of  ...  This occurs even when both models are trained for the same number of iterations and large batches achieve smaller training losses.  ...  helped improve the paper.  ... 
arXiv:2006.15081v1 fatcat:t24nhd4oh5hybnhj74na7hswou

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Vyacheslav Kungurtsev, Malcolm Egan, Bapi Chatterjee, Dan Alistarh
In this paper, we analyze for the first time the convergence of stochastic asynchronous optimization for this general class of objectives.  ...  Under this model, we establish convergence with probability one to an invariant set for stochastic subgradient methods with momentum.  ...  Acknowledgments Vyacheslav Kungurtsev was supported by the OP VVV project CZ.02.1.01/0.0/0.0/16 019/0000765 "Research Center for Informatics."  ... 
doi:10.1609/aaai.v35i9.16999 fatcat:ctq3mtv4zbgw7glyhjbhb636my

Reliability of Large-Eddy Simulations: Benchmarking and Uncertainty Quantification [chapter]

M. V. Salvetti, M. Meldi, L. Bruno, P. Sagaut
2017 ERCOFTAC Series  
, lower-order schemes giving better results than higher-order ones or, (iii) for given grid and numerical scheme, no model simulations giving better results than LES with SGS modeling (see e.g. [1, 2]  ...  A first way of thinking is that numerical errors should be made negligible and all the burden should be on the SGS model.  ...  Taken from Fig. 2 a 2 Mean stochastic error for different output quantities. b Partial variances for the error on momentum thickness.  ... 
doi:10.1007/978-3-319-63212-4_2 fatcat:5woqy4t5h5achgj2o7ztisbspq

A Geometric Framework for Stochastic Shape Analysis

Alexis Arnaudon, Darryl D. Holm, Stefan Sommer
2018 Foundations of Computational Mathematics  
We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic evolution of shapes, images and landmarks.  ...  We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points.  ...  This view is based on the existence of momentum maps, which are characterized by the transformation properties of the data structures for images and shapes.  ... 
doi:10.1007/s10208-018-9394-z fatcat:cegn7rnyfzettf2adger2f2cpm

Don't Use Large Mini-Batches, Use Local SGD [article]

Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi
2020 arXiv   pre-print
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.  ...  We further provide an extensive study of the communication efficiency vs. performance trade-offs associated with a host of local SGD variants.  ...  We use sign-based compression scheme (i.e. signSGD (Bernstein et al., 2018) and EF-signSGD (Karimireddy et al., 2019)) for the demonstration.  ... 
arXiv:1808.07217v6 fatcat:7cmirv2pxrfafh24xjryn5a7bm

Relativistic Monte Carlo [article]

Xiaoyu Lu and Valerio Perrone and Leonard Hasenclever and Yee Whye Teh and Sebastian J. Vollmer
2016 arXiv   pre-print
Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo.  ...  However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution.  ...  Acknowledgement XU thanks the PAG scholarschip and New College for support. LH and VP is funded by the EPSRC doctoral training centre OXWASP through EP/L016710/1.  ... 
arXiv:1609.04388v1 fatcat:mopgqocljnb2xp4umyengcjoua

Quasi-hyperbolic momentum and Adam for deep learning [article]

Jerry Ma, Denis Yarats
2019 arXiv   pre-print
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning.  ...  We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step.  ...  ACKNOWLEDGMENTS We thank Aaron Defazio, Nicolas Loizou, Yann Olivier, Mark Tygert, and anonymous reviewers and commenters for insightful discussions and valuable suggestions.  ... 
arXiv:1810.06801v4 fatcat:tq3iul7mdnhjhjjtq5d7edjacm

TAdam: A Robust Stochastic Gradient Optimizer [article]

Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
2020 arXiv   pre-print
Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse  ...  We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.  ...  INTRODUCTION The field of machine learning is undoubtedly dominated by first-order optimization methods based on the gradient descent algorithm and particularly [1] , its stochastic variant, the stochastic  ... 
arXiv:2003.00179v2 fatcat:h632ernfsra5pd4pajr2tfwtxi

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization [article]

Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
2021 arXiv   pre-print
To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima.  ...  Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference  ...  role of the momentum in nonconvex stochastic optimization?  ... 
arXiv:1802.05155v5 fatcat:pbgwbm6tdfhwtfrdgfmoambbfe

Application of global optimization to design of an aspherical pickup head for multiple wavelengths

Chao-Hsi Tsao, Jyh-Long Chern, G. Groot Gregory, Joseph M. Howard, R. John Koshel
2007 International Optical Design Conference 2006  
Based on the optimal variable set of aspheric coefficients obtained by the proposed global optimization strategy, singlet objective lens for different operational configurations, i.e., for CD and DVD,  ...  An optimization process combining of global optimization algorithm and further optimization treatment is proposed and demonstrated with application to the objective lenses of multiple-wavelength configurations  ...  This work is also partially supported by the MOE ATU program at the National Chiao Tung University. We thank the Lambda Research Corp. for the educational support of software, OSLO.  ... 
doi:10.1117/12.692230 fatcat:tzlulglprrdy5pxji7waa24qla

Error Feedback Fixes SignSGD and other Gradient Compression Schemes [article]

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, and Martin Jaggi
2019 arXiv   pre-print
These issues arise because of the biased nature of the sign compression operator.  ...  Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory and show that error-feedback improves both convergence and generalization. Code can be found at .  ...  Conclusion We study the effect of biased compressors on the convergence and generalization of stochastic gradient algorithms for non-convex optimization.  ... 
arXiv:1901.09847v2 fatcat:2crhupyoizbnzkw4z5xewac4lm
« Previous Showing results 1 — 15 out of 6,862 results