5,676 Hits in 4.5 sec

On the insufficiency of existing momentum schemes for Stochastic Optimization [article]

Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade
2018 arXiv   pre-print
In the stochastic case, the popular explanations for their wide applicability is that when these fast gradient methods are applied in the stochastic case, they partially mimic their exact gradient counterparts  ...  Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning  ...  Acknowledgments Sham Kakade acknowledges funding from Washington Research Foundation Fund for Innovation in Data-Intensive Discovery and the NSF through awards CCF-1637360, CCF-1703574 and CCF-1740551.  ... 
arXiv:1803.05591v2 fatcat:tisn3budvzfbdipxu7oqbi73je

Page 3190 of Mathematical Reviews Vol. , Issue 2004d [page]

2004 Mathematical Reviews  
Milstein scheme for stochastic differential equations.  ...  Then, the authors deduce some existence results for 4, and some relationships such as comparison of the Hausdorff mea- sures of Mo and MW.  ... 

Explicitly Stochastic Parameterization of Nonorographic Gravity Wave Drag

Stephen D. Eckermann
2011 Journal of the Atmospheric Sciences  
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining  ...  the data needed, and completing and reviewing the collection of information.  ...  This research was supported by the Office of Naval Research through the NRL 6.1 work unit "Subgridscale Dynamics of Middle and Upper Atmospheres" and by NASA's Global Modeleing and Analysis Program, contract  ... 
doi:10.1175/2011jas3684.1 fatcat:g4cezl5uq5ao7msxhe673yl55y

On the Generalization Benefit of Noise in Stochastic Gradient Descent [article]

Samuel L. Smith, Erich Elsen, Soham De
2020 arXiv   pre-print
We study how the optimal learning rate schedule changes as the epoch budget grows, and we provide a theoretical account of our observations based on the stochastic differential equation perspective of  ...  This occurs even when both models are trained for the same number of iterations and large batches achieve smaller training losses.  ...  helped improve the paper.  ... 
arXiv:2006.15081v1 fatcat:t24nhd4oh5hybnhj74na7hswou

Reliability of Large-Eddy Simulations: Benchmarking and Uncertainty Quantification [chapter]

M. V. Salvetti, M. Meldi, L. Bruno, P. Sagaut
2017 ERCOFTAC Series  
, lower-order schemes giving better results than higher-order ones or, (iii) for given grid and numerical scheme, no model simulations giving better results than LES with SGS modeling (see e.g. [1, 2]  ...  A first way of thinking is that numerical errors should be made negligible and all the burden should be on the SGS model.  ...  Taken from Fig. 2 a 2 Mean stochastic error for different output quantities. b Partial variances for the error on momentum thickness.  ... 
doi:10.1007/978-3-319-63212-4_2 fatcat:5woqy4t5h5achgj2o7ztisbspq

A Geometric Framework for Stochastic Shape Analysis

Alexis Arnaudon, Darryl D. Holm, Stefan Sommer
2018 Foundations of Computational Mathematics  
We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic evolution of shapes, images and landmarks.  ...  We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points.  ...  This view is based on the existence of momentum maps, which are characterized by the transformation properties of the data structures for images and shapes.  ... 
doi:10.1007/s10208-018-9394-z fatcat:cegn7rnyfzettf2adger2f2cpm

Don't Use Large Mini-Batches, Use Local SGD [article]

Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi
2020 arXiv   pre-print
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks.  ...  We further provide an extensive study of the communication efficiency vs. performance trade-offs associated with a host of local SGD variants.  ...  We use sign-based compression scheme (i.e. signSGD (Bernstein et al., 2018) and EF-signSGD (Karimireddy et al., 2019)) for the demonstration.  ... 
arXiv:1808.07217v6 fatcat:7cmirv2pxrfafh24xjryn5a7bm

Relativistic Monte Carlo [article]

Xiaoyu Lu and Valerio Perrone and Leonard Hasenclever and Yee Whye Teh and Sebastian J. Vollmer
2016 arXiv   pre-print
Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo.  ...  However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution.  ...  Acknowledgement XU thanks the PAG scholarschip and New College for support. LH and VP is funded by the EPSRC doctoral training centre OXWASP through EP/L016710/1.  ... 
arXiv:1609.04388v1 fatcat:mopgqocljnb2xp4umyengcjoua

Quasi-hyperbolic momentum and Adam for deep learning [article]

Jerry Ma, Denis Yarats
2019 arXiv   pre-print
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning.  ...  We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step.  ...  ACKNOWLEDGMENTS We thank Aaron Defazio, Nicolas Loizou, Yann Olivier, Mark Tygert, and anonymous reviewers and commenters for insightful discussions and valuable suggestions.  ... 
arXiv:1810.06801v4 fatcat:tq3iul7mdnhjhjjtq5d7edjacm

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization [article]

Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
2021 arXiv   pre-print
To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima.  ...  Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference  ...  role of the momentum in nonconvex stochastic optimization?  ... 
arXiv:1802.05155v5 fatcat:pbgwbm6tdfhwtfrdgfmoambbfe

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum [article]

Junhyung Lyle Kim, Panos Toulis, Anastasios Kyrillidis
2021 arXiv   pre-print
Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training.  ...  To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared  ...  On the insufficiency of existing momentum schemes for Stochastic Optimization. February 2018. URL https: // Andrei Kulunchakov and Julien Mairal.  ... 
arXiv:2111.06171v3 fatcat:ycdfgoaxuzbcxi52l7himbtyb4

TAdam: A Robust Stochastic Gradient Optimizer [article]

Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
2020 arXiv   pre-print
Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse  ...  We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.  ...  INTRODUCTION The field of machine learning is undoubtedly dominated by first-order optimization methods based on the gradient descent algorithm and particularly [1] , its stochastic variant, the stochastic  ... 
arXiv:2003.00179v2 fatcat:h632ernfsra5pd4pajr2tfwtxi

Application of global optimization to design of an aspherical pickup head for multiple wavelengths

Chao-Hsi Tsao, Jyh-Long Chern, G. Groot Gregory, Joseph M. Howard, R. John Koshel
2007 International Optical Design Conference 2006  
Based on the optimal variable set of aspheric coefficients obtained by the proposed global optimization strategy, singlet objective lens for different operational configurations, i.e., for CD and DVD,  ...  An optimization process combining of global optimization algorithm and further optimization treatment is proposed and demonstrated with application to the objective lenses of multiple-wavelength configurations  ...  This work is also partially supported by the MOE ATU program at the National Chiao Tung University. We thank the Lambda Research Corp. for the educational support of software, OSLO.  ... 
doi:10.1117/12.692230 fatcat:tzlulglprrdy5pxji7waa24qla

Stochastic Training is Not Necessary for Generalization [article]

Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein
2021 arXiv   pre-print
tuning optimizers and hyperparameters for small-batch training.  ...  Our observations further indicate that the perceived difficulty of full-batch training is largely the result of its optimization properties and the disproportionate time and effort spent by the ML community  ...  thank the Zentrum für Informationsund Medientechnik of the University of Siegen for their support.  ... 
arXiv:2109.14119v1 fatcat:prvn7wiogjcx5mnaer2lltj4qm

Error Feedback Fixes SignSGD and other Gradient Compression Schemes [article]

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, and Martin Jaggi
2019 arXiv   pre-print
These issues arise because of the biased nature of the sign compression operator.  ...  Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory and show that error-feedback improves both convergence and generalization. Code can be found at .  ...  Conclusion We study the effect of biased compressors on the convergence and generalization of stochastic gradient algorithms for non-convex optimization.  ... 
arXiv:1901.09847v2 fatcat:2crhupyoizbnzkw4z5xewac4lm
« Previous Showing results 1 — 15 out of 5,676 results