Filters








5,778 Hits in 4.2 sec

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [article]

Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan
2020 arXiv   pre-print
and stochastic optimization.  ...  Adaptivity is an important yet under-studied property in modern optimization theory.  ...  The many faces of stochastic gradient descent We start with a brief review of relevant aspects of gradient-based optimization algorithms.  ... 
arXiv:2002.05359v1 fatcat:34r5nlw4krg4xe2vhlxe4axdwu

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [article]

Dongruo Zhou and Yiqi Tang and Ziyan Yang and Yuan Cao and Quanquan Gu
2018 arXiv   pre-print
Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been sufficiently studied.  ...  In this paper, we provide a sharp analysis of a recently proposed adaptive gradient method namely partially adaptive momentum estimation method (Padam) (Chen and Gu, 2018), which admits many existing adaptive  ...  the vector vt is a huge gap between existing online convex optimization guarantees for adaptive gradient methods and the empirical successes of adaptive gradient methods in nonconvex optimization.  ... 
arXiv:1808.05671v2 fatcat:k437chfxc5erxosw4or7p75djy

AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization

Yan Liu, Maojun Zhang, Zhiwei Zhong, Xiangrong Zeng, Paolo Gastaldo
2021 Computational Intelligence and Neuroscience  
In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization.  ...  It only requires at most first order gradients and updates with linear complexity for both time and memory.  ...  Conclusion We have proposed AdaCN, a novel, efficient, and effective adaptive cubic Newton method for nonconvex stochastic optimization. is method is designed for large-scale nonconvex stochastic optimization  ... 
doi:10.1155/2021/5790608 pmid:34804146 pmcid:PMC8598341 fatcat:wlftz755hbc2reqibqyk7aluj4

Adaptive Methods for Nonconvex Optimization

Manzil Zaheer, Sashank J. Reddi, Devendra Singh Sachan, Satyen Kale, Sanjiv Kumar
2018 Neural Information Processing Systems  
Adaptive gradient methods that rely on scaling gradients down by the square root of exponential moving averages of past squared gradients, such RMSPROP, ADAM, ADADELTA have found wide application in optimizing  ...  In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.  ...  To this end, we propose a simple additive adaptive method, YOGI, for optimizing the stochastic nonconvex optimization problem of our interest.  ... 
dblp:conf/nips/ZaheerRSKK18 fatcat:lyu6nebfdbcublfoggqwz67neq

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization [article]

Xuezhe Ma
2021 arXiv   pre-print
In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal  ...  Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory.  ...  Appendix: Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization Appendix A.  ... 
arXiv:2009.13586v6 fatcat:amo5fj3uingldbsnvr5ubl6dpq

A High Probability Analysis of Adaptive SGD with Momentum [article]

Xiaoyu Li, Francesco Orabona
2020 arXiv   pre-print
We use it to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.  ...  In this paper, we present a high probability analysis for adaptive and momentum algorithms, under weak assumptions on the function, stochastic gradients, and learning rates.  ...  Acknowledgements This material is based upon work supported by the National Science Foundation under grant no. 1925930 "Collaborative Research: TRIPODS Institute for Optimization and Learning".  ... 
arXiv:2007.14294v1 fatcat:moi4merl2ngafi5n2px52ddxpq

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks [article]

Jinghui Chen and Dongruo Zhou and Yiqi Tang and Ziyan Yang and Yuan Cao and Quanquan Gu
2020 arXiv   pre-print
We also prove the convergence rate of our proposed algorithm to a stationary point in the stochastic nonconvex optimization setting.  ...  These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks.  ...  Acknowledgements We thank the anonymous reviewers for their helpful comments.  ... 
arXiv:1806.06763v3 fatcat:i2ly353yhnegfda43ugwtdztsa

Neighbor Combinatorial Attention for Critical Structure Mining

Tanli Zuo, Yukun Qiu, Wei-Shi Zheng
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
However, existing GNN methods do not explicitly extract critical structures, which reflect the intrinsic property of a graph.  ...  By stacking several NCAT modules, we can extract hierarchical structures that is helpful for down-stream tasks.  ...  Acknowledgements We thank the anonymous reviewers for their helpful comments.  ... 
doi:10.24963/ijcai.2020/452 dblp:conf/ijcai/ChenZTYCG20 fatcat:yjvg5fobdvfnrmwhzzqmtqwcni

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization [article]

Jia Bi, Steve R. Gunn
2019 arXiv   pre-print
A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance  ...  Theory shows these optimization methods can converge by using an unbiased gradient estimator.  ...  Source code can be download: http://ca e.berkeleyvision.org stochastic methods which can perform better than SGD and GD for nonconvex optimization [Reddi et al., 2016b; J. Reddi et al., 2016] .  ... 
arXiv:1905.05185v1 fatcat:qedgjqy5ijdcpbyxprwe227m4q

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

Krishnakumar Balasubramanian, Saeed Ghadimi
2018 Neural Information Processing Systems  
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization.  ...  Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information.  ...  Zeroth-order Stochastic Gradient Method for Nonconvex Problems In this subsection, we consider the zeroth-order stochastic gradient method presented in [9] (provided in Algorithm 4 for convenience) and  ... 
dblp:conf/nips/Balasubramanian18 fatcat:3kf3clihxbdsfiihxsyvrtdvu4

An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization [article]

Anton Dereventsov, Clayton G. Webster, Joseph D. Daws Jr
2022 arXiv   pre-print
In this work, we propose a novel adaptive stochastic gradient-free (ASGF) approach for solving high-dimensional nonconvex optimization problems based on function evaluations.  ...  As such, the ASGF strategy offers significant improvements when solving high-dimensional nonconvex optimization problems when compared to other gradient-free methods (including the so called "evolutionary  ...  Conclusions In this work we introduce an adaptive stochastic gradient-free method designed for solving highdimensional nonconvex blackbox optimization problems.  ... 
arXiv:2006.10887v2 fatcat:rrzaixceq5fr7nuwpgu62tf6cq

Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization [article]

Qunwei Li, Yi Zhou, Yingbin Liang, Pramod K. Varshney
2017 arXiv   pre-print
In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc).  ...  Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee.  ...  Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization  ... 
arXiv:1705.04925v1 fatcat:zkcczurf7fb23fw6c5echbf4ce

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
However, its convergence rate is O(1/√T) for the nonconvex problems, which is significantly slower than the best convergence rate O(T1) of the zerothorder stochastic algorithm, where T is the iteration  ...  The gradient-free (zeroth-order) method can address these problems because only the objective function values are required in the optimization.  ...  After that, Li and Lin (2015) presented a class of accelerated PG methods for nonconvex optimization. More recently, introduced inexact PG methods for nonconvex nonsmooth optimization.  ... 
doi:10.1609/aaai.v33i01.33011503 fatcat:jaudjs4vobbo3fitemtbzkqvdu

Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation [article]

Paolo Di Lorenzo, Simone Scardapane
2020 arXiv   pre-print
We study distributed stochastic nonconvex optimization in multi-agent networks.  ...  Almost sure convergence to (stationary) solutions of the nonconvex problem is established. Finally, the method is applied to distributed stochastic training of neural networks.  ...  All previous art on distributed stochastic nonconvex optimization is based on first-order methods that exploit only gradient information of the objective functions f i , and does not consider constraints  ... 
arXiv:2004.14882v1 fatcat:oat7muwqzvfovjtx5atmmvna7m

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization

Sashank J. Reddi, Suvrit Sra, Barnabás Póczos, Alexander J. Smola
2016 Neural Information Processing Systems  
For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point.  ...  We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonsmooth part is convex.  ...  Acknowledgment: SS acknowledges support of NSF grant: IIS-1409802. 5 The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets.  ... 
dblp:conf/nips/ReddiSPS16 fatcat:pctzoj4hifhuhkqpdlg56ia2fi
« Previous Showing results 1 — 15 out of 5,778 results