563 Hits in 4.2 sec

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches [article]

Filip Hanzely, Peter Richtárik
2018 arXiv   pre-print
We prove a rate that is at most O(√(τ)) times worse than the rate of minibatch ACD with uniform sampling, but can be O(n/τ) times better, where τ is the minibatch size.  ...  Lastly, we obtain similar results for minibatch nonaccelerated CD as well, achieving improvements on previous best rates.  ...  between ACD with uniform probabilities (for τ = 1) and accelerated gradient descent (for τ = n).  ... 
arXiv:1809.09354v2 fatcat:aernktpuenbkzjcs2rzq76ckhu

SAGA with Arbitrary Sampling [article]

Xu Qian and Zheng Qu and Peter Richtárik
2019 arXiv   pre-print
Despite years of research on the topic, a general-purpose version of SAGA---one that would include arbitrary importance sampling and minibatching schemes---does not exist.  ...  Our rates match those of the primal-dual method Quartz for which an arbitrary sampling analysis is available, which makes a significant step towards closing the gap in our understanding of complexity of  ...  Accelerated coordinate descent with arbitrary sampling and best rates for minibatches. A.  ... 
arXiv:1901.08669v1 fatcat:rv5uimwyzrftxgvctxk2li5awm

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization [article]

Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik
2020 arXiv   pre-print
For proximal SGD, the quantization and coordinate type methods, we uncover new state-of-the-art convergence rates. Our analysis also includes any form of sampling and minibatching.  ...  For the variance reduced methods, we recover the best known convergence rates as special cases.  ...  Acknowledgements Peter Richtárik thanks for the support from KAUST through the Baseline Research Fund scheme.  ... 
arXiv:2006.11573v1 fatcat:3cvbwzn7a5cxfh5dcqtclgtbb4

Nonconvex Variance Reduced Optimization with Arbitrary Sampling [article]

Samuel Horváth, Peter Richtárik
2019 arXiv   pre-print
All the above results follow from a general analysis of the methods which works with arbitrary sampling, i.e., fully general randomized strategy for the selection of subsets of examples to be sampled in  ...  Moreover, we also improve upon current mini-batch analysis of these methods by proposing importance sampling for minibatches in this setting.  ...  Accelerated coordinate descent with arbitrary sampling and best rates for minibatches. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019.  ... 
arXiv:1809.04146v2 fatcat:vajcltes5nenzbhd23fjsswuem

Data Sampling Strategies in Stochastic Algorithms for Empirical Risk Minimization [article]

Dominik Csiba
2018 arXiv   pre-print
In the first four chapters we focus on four views on the sampling for convex problems, developing and analyzing new state-of-the-art methods using non-standard data sampling strategies.  ...  Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems.  ...  To the best of our knowledge, all the rates in the tables are novel, except S g P L for gradient descent and uniform and greedy coordinate descents for both smooth and non-smooth case.  ... 
arXiv:1804.00437v1 fatcat:gme5tiqcmnbvzey7rj5s7yppja

Stochastic Subspace Cubic Newton Method [article]

Filip Hanzely, Nikita Doikov, Peter Richtárik, Yurii Nesterov
2020 arXiv   pre-print
We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving  ...  Our method can be seen both as a stochastic extension of the cubically-regularized Newton method of Nesterov and Polyak (2006), and a second-order enhancement of stochastic subspace descent of Kozak et  ...  Acknowledgements The work of the second and the fourth author was supported by ERC Advanced Grant 788368.  ... 
arXiv:2002.09526v1 fatcat:yvk6lbawmrexfksvg4p5sy2fuq

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization [article]

Zheng Qu and Peter Richtárik and Martin Takáč and Olivier Fercoq
2015 arXiv   pre-print
theory and practice - sometimes by orders of magnitude.  ...  We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA).  ...  Shalev-Shwartz & Zhang (2013b) studied minibatching but in conjunction with acceleration and the QUARTZ method of Qu et al. (2014) , which has been analyzed for an arbitrary samplingŜ, uses a different  ... 
arXiv:1502.02268v1 fatcat:45tdaubndrbgdl7nnz6vdq4exu

SEGA: Variance Reduction via Gradient Sketching [article]

Filip Hanzely and Konstantin Mishchenko and Peter Richtarik
2018 arXiv   pre-print
In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical  ...  to the best-known rate of coordinate descent.  ...  Accelerated coordinate descent with arbitrary sampling and best rates for minibatches. arXiv preprint arXiv:1809.09354, 2018. [21] Robert Hooke and Terry A Jeeves.  ... 
arXiv:1809.03054v2 fatcat:ah4lrmcvtrchbmfbfsnq74xefy

Improving SAGA via a Probabilistic Interpolation with Gradient Descent [article]

Adel Bibi, Alibek Sailanbayev, Bernard Ghanem, Robert Mansel Gower and Peter Richtárik
2020 arXiv   pre-print
For example, for a well conditioned problem the choice q=1/(n-1)^2, where n is the number of data samples, gives a method with an overall complexity which is better than both the complexity of GD and SAGA  ...  Our method—SAGD—is based on a probabilistic interpolation of SAGA and gradient descent (GD).  ...  . , n}, |C| = τ }, (24) and consider the sampling given in (26). That is we either sample a coordinate j with probability (1 − q)/ n or we sample a minibatch C ∈ G with probability q/ n τ .  ... 
arXiv:1806.05633v2 fatcat:5lulpv6fhvgbrecqoeyrwzl5pi

SGD with Arbitrary Sampling: General Analysis and Improved Rates

Xun Qian, Peter Richtárik, Robert M. Gower, Alibek Sailanbayev, Nicolas Loizou, Egor Shulgin
2019 International Conference on Machine Learning  
By specializing our theorem to different mini-batching strategies, such as sampling with replacement and independent sampling, we derive exact expressions for the stepsize as a function of the mini-batch  ...  We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm.  ...  optimization, operations research and their interactions with data sciences.  ... 
dblp:conf/icml/QianRGSLS19 fatcat:pnditriyybbsbiiqzgltwpy2me

99 Fix it [article]

Konstantin Mishchenko and Filip Hanzely and Peter Richtárik
2019 arXiv   pre-print
It is also well known that many such methods, including SGD, SAGA, and accelerated SGD for over-parameterized models, do not scale well with the number of parallel workers.  ...  Namely, we develop a new variant of parallel block coordinate descent based on independent sparsification of the local gradient estimates before communication.  ...  Hanzely, F. and Richtárik, P. Accelerated coordinate descent with arbitrary sampling and best rates for minibatches. arXiv preprint arXiv:1809.09354, 2018. A. B.  ... 
arXiv:1901.09437v2 fatcat:up2xhbyfojhgndvomvuhq3qxd4

SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation [article]

Robert M. Gower, Othmane Sebbouh, Nicolas Loizou
2021 arXiv   pre-print
In addition, all of our analysis holds for the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching and determine an optimal minibatch size.  ...  Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-convex functions.  ...  Thus we recover the best known rate on either end (b = n and b = 1), and give the first rates for everything in between 1 < b < n.  ... 
arXiv:2006.10311v3 fatcat:g7dqyu7775hwtbrnopgcqjg6te

Fastest Rates for Stochastic Mirror Descent Methods [article]

Filip Hanzely, Peter Richtárik
2018 arXiv   pre-print
One of them, relRCD corresponds to the first stochastic variant of mirror descent algorithm with linear convergence rate.  ...  We propose and analyze two new algorithms: Relative Randomized Coordinate Descent (relRCD) and Relative Stochastic Gradient Descent (relSGD), both generalizing famous algorithms in the standard smooth  ...  Relative Randomized Coordinate Descent with Short Stepsizes In this section, we propose and analyze a naive coordinate descent algorithm for minimizing relative smooth functions.  ... 
arXiv:1803.07374v1 fatcat:i3cqewywhvbhfjmn5paaajn66q

An argument in favor of strong scaling for deep neural networks with small datasets [article]

Renato L. de F. Cunha, Eduardo R. Rodrigues, Matheus Palhares Viana, Dario Augusto Borges Oliveira
2018 arXiv   pre-print
The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes.  ...  This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications.  ...  We start with a general description of the Stochastic Gradient Descent (SGD) method, and the two possible ways to deal with the minibatch sizes as one increases the number of processing units.  ... 
arXiv:1807.09161v1 fatcat:ho2ssptwa5ennf5ncym44fnofu

Adaptive Catalyst for Smooth Convex Optimization [article]

Anastasiya Ivanova, Dmitry Pasechnyuk, Dmitry Grishchenko, Egor Shulgin, Alexander Gasnikov, Vladislav Matyukhin
2021 arXiv   pre-print
In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems.  ...  As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization.  ...  Gasnikov We also would like to thank anonymous reviewers for their fruitful comments.  ... 
arXiv:1911.11271v6 fatcat:awbnigrwwfecdkz3atllbf736i
« Previous Showing results 1 — 15 out of 563 results