1,564 Hits in 4.2 sec

Acceleration and Averaging in Stochastic Mirror Descent Dynamics [article]

Walid Krichene, Peter L. Bartlett
2017 arXiv   pre-print
We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions.  ...  Building on an averaging formulation of accelerated mirror descent, we propose a stochastic variant in which the gradient is contaminated by noise, and study the resulting stochastic differential equation  ...  Many such algorithms can be viewed as a discretization of a continuous-time dynamics.  ... 
arXiv:1707.06219v1 fatcat:63we5k3tvrcc7cvyphxdgo56wi

Stochastic Gradient Descent in Continuous Time [article]

Justin Sirignano, Konstantinos Spiliopoulos
2017 arXiv   pre-print
For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm.  ...  The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data.  ...  This approach does not use continuous-time stochastic gradient descent to learn the model dynamics, but instead directly learns the optimal policy from the data.  ... 
arXiv:1611.05545v4 fatcat:ufebuqehojhctb4an7bf4q5lhu

Stochastic Mirror Descent for Convex Optimization with Consensus Constraints [article]

Anastasia Borovykh, Nikolas Kantas, Panos Parpas, Grigorios A. Pavliotis
2022 arXiv   pre-print
In this paper we propose and study exact distributed mirror descent algorithms in continuous-time under additive noise and present the settings that enable linear convergence rates.  ...  The mirror descent algorithm is known to be effective in applications where it is beneficial to adapt the mirror map to the underlying geometry of the optimization model.  ...  Main results and contributions Our results are based on a continuous-time analysis of stochastic mirror descent dynamics.  ... 
arXiv:2201.08642v2 fatcat:b5q55futazalneqwxzo2ofzz4y

Stochastic Gradient Descent in Continuous Time

Justin Sirignano, Konstantinos Spiliopoulos
2017 SIAM Journal on Financial Mathematics  
We consider stochastic gradient descent for continuous-time models.  ...  The stochastic gradient descent algorithm performs an online parameter update in continuous time, with the parameter updates θt satisfying a stochastic differential equation.  ...  The continuous-time stochastic gradient descent algorithm allows for the control and reduction of numerical error due to discretization.  ... 
doi:10.1137/17m1126825 fatcat:ivc4dp7zhrhtndl4loi7ukuis4

On stochastic mirror descent with interacting particles: convergence properties and variance reduction [article]

Anastasia Borovykh, Nikolas Kantas, Panos Parpas, Grigorios A. Pavliotis
2020 arXiv   pre-print
To address this question, we reduce the problem of the computation of an exact minimizer with noisy gradient information to the study of stochastic mirror descent with interacting particles.  ...  We study the convergence of stochastic mirror descent and make explicit the tradeoffs between communication and variance reduction.  ...  Note that when σ = 0 we obtain a deterministic variant of continuous time mirror descent. The long time behavior of such dynamics are very well understood.  ... 
arXiv:2007.07704v2 fatcat:osprl6goprfbhcuttpqgvhtnfq

Stochastic mirror descent dynamics and their convergence in monotone variational inequalities [article]

Panayotis Mertikopoulos, Mathias Staudigl
2018 arXiv   pre-print
We examine a class of stochastic mirror descent dynamics in the context of monotone variational inequalities (including Nash equilibrium and saddle-point problems).  ...  The dynamics under study are formulated as a stochastic differential equation driven by a (single-valued) monotone operator and perturbed by a Brownian motion.  ...  Stochastic Mirror Descent Dynamics. Mirror descent is an iterative optimization algorithm combining first-order oracle steps with a "mirror step" generated by a projection-type mapping.  ... 
arXiv:1710.01551v3 fatcat:d5pvog3nkverrfrqw2ow466seq

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip [article]

Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor
2021 arXiv   pre-print
This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions  ...  for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration.  ...  We also acknowledge support from the European Research Council (grant SEQUOIA 724063), from the DGA, and from the MSR-INRIA joint centre. References David Aldous and James Allen Fill.  ... 
arXiv:2106.07644v2 fatcat:6dmvas3rp5byff3cdzsnmlmenu

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent [article]

Chi Jin, Praneeth Netrapalli, Michael I. Jordan
2017 arXiv   pre-print
Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting.  ...  To the best of our knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in  ...  It is monotonically decreasing in the continuous-time setting. This is not the case in general in the discrete-time setting, a fact which requires us to incorporate the NCE step.  ... 
arXiv:1711.10456v1 fatcat:pkiddxkz6nfwzenzxqqptcrluu

Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions [article]

Ashia Wilson, Lester Mackey, Andre Wibisono
2020 arXiv   pre-print
We present a family of algorithms, called descent algorithms, for optimizing convex and non-convex functions.  ...  Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks.  ...  Acknowledgments We would like to thank Jingzhao Zhang for providing us access to his code.  ... 
arXiv:1902.08825v3 fatcat:v4opaqjda5fddplwkqj647uhku

Stochastic Mirror Descent Dynamics and Their Convergence in Monotone Variational Inequalities

Panayotis Mertikopoulos, Mathias Staudigl
2018 Journal of Optimization Theory and Applications  
We examine a class of stochastic mirror descent dynamics in the context of monotone variational inequalities (including Nash equilibrium and saddle-point problems).  ...  Keywords Mirror descent · Variational inequalities · Saddle-point problems · Stochastic differential equations Mathematics Subject Classification 90C25 · 90C33 · 90C47 B Mathias Staudigl  ...  the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.  ... 
doi:10.1007/s10957-018-1346-x pmid:30416208 pmcid:PMC6208661 fatcat:w2xs7nwquzbktpcomsbp75kkhe

Sparse Optimization on Measures with Over-parameterized Gradient Descent [article]

Lenaic Chizat
2020 arXiv   pre-print
We show that this problem can be solved by discretizing the measure and running non-convex gradient descent on the positions and weights of the particles.  ...  The key theoretical tools are a local convergence analysis in Wasserstein space and an analysis of a perturbed mirror descent in the space of measures.  ...  Acknowledgments The author thanks Francis Bach for fruitful discussions related to this work and the anonymous referees for their thorough reading and suggestions.  ... 
arXiv:1907.10300v2 fatcat:gju56fjiazhabbsp5as7pxbgza

Beyond Convexity – Contraction and Global Convergence of Gradient Descent [article]

Patrick M. Wensing, Jean-Jacques E. Slotine
2020 arXiv   pre-print
This paper considers the analysis of continuous time gradient-based optimization algorithms through the lens of nonlinear contraction theory.  ...  In particular, gradient descent converges to a unique equilibrium if its dynamics are contracting in any metric, with convexity of the cost corresponding to the special case of contraction in the identity  ...  This research was supported in part by grant 1809314 from the National Science Foundation. Supporting Information Proof of Theorem 1 Proof.  ... 
arXiv:1806.06655v6 fatcat:mrpxa2d2jvcwhppxb7entbtxy4

Accelerated iterative regularization via dual diagonal descent [article]

Luca Calatroni and Guillaume Garrigos and Lorenzo Rosasco and Silvia Villa
2019 arXiv   pre-print
We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with general regularization and data-fit functions.  ...  Using tools from inexact proximal calculus, we prove early stopping results with optimal convergence rates for additive data-fit terms as well as more general cases, such as the Kullback-Leibler divergence  ...  From the continuous dynamic to the discrete algorithm We follow a standard approach of computiong the time-discretization of the continuous dynamical system [1, 7, 54, 10] .  ... 
arXiv:1912.12153v1 fatcat:hdhdx74hefdqnhcz74uvb4zyxq

Multitask Online Mirror Descent [article]

Nicolò Cesa-Bianchi, Pierre Laforgue, Andrea Paudice, Massimiliano Pontil
2021 arXiv   pre-print
We introduce and analyze MT-OMD, a multitask generalization of Online Mirror Descent (OMD) which operates by sharing updates between tasks.  ...  We prove that the regret of MT-OMD is of order √(1 + σ^2(N-1))√(T), where σ^2 is the task variance according to the geometry induced by the regularizer, N is the number of tasks, and T is the time horizon  ...  Each computer is rated on a discrete scale from 0 to 10, expressing the likelihood of an individual buying that computer.  ... 
arXiv:2106.02393v2 fatcat:4ylor77kvff4doqy7gmrot4u7u

On Markov Chain Gradient Descent [article]

Tao Sun, Yuejiao Sun, Wotao Yin
2018 arXiv   pre-print
This paper studies Markov chain gradient descent, a variant of stochastic gradient descent where the random samples are taken on the trajectory of a Markov chain.  ...  Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering.  ...  In [1], MCGD is generalized from gradient descent to mirror descent. In all these works, the Markov chain is required to be reversible, and all functions f i , i ∈ [M ], are assumed to be convex.  ... 
arXiv:1809.04216v1 fatcat:4gewckcfo5cwrjhpsuleo6znc4
« Previous Showing results 1 — 15 out of 1,564 results