814 Hits in 6.8 sec

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates [article]

Alp Yurtsever, Alex Gu, Suvrit Sra
2022 arXiv   pre-print
The first two permit using subgradients and stochastic gradients, and are shown to ensure a 𝒪(1/√(t)) convergence rate. The third extension AdapTOS endows TOS with adaptive step-sizes.  ...  This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle  ...  (WASP) funded by the Knut and Alice Wallenberg Foundation, and partial postdoctoral support from the NSF-CAREER grant IIS-1846088.  ... 
arXiv:2110.03274v2 fatcat:xn4i5owx2zfnngwyyq6l4kxj7e

Max-margin Deep Generative Models [article]

Chongxuan Li and Jun Zhu and Tianlin Shi and Bo Zhang
2015 arXiv   pre-print
We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objective.  ...  Empirical results on MNIST and SVHN datasets demonstrate that (1) max-margin learning can significantly improve the prediction performance of DGMs and meanwhile retain the generative ability; and (2) mmDGMs  ...  Although it is an adaptive gradient-based optimization method, we decay the global learning rate by factor three periodically after sufficient number of epochs to ensure a stable convergence.  ... 
arXiv:1504.06787v4 fatcat:3snwaskhsvdkljdah4b2yhtkmy

Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training

Antonio Robles-Kelly, Asef Nazari
2019 2019 Digital Image Computing: Techniques and Applications (DICTA)  
Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad and RMSprop.  ...  In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.  ...  The table shows the percentage classification error rates for the three datasets and the four learning rate methods under study, with the best performing method indicated in bold fonts. IV.  ... 
doi:10.1109/dicta47822.2019.8945980 dblp:conf/dicta/Robles-KellyN19 fatcat:aase42bjybgqhi7ofiu6elnhdi

Ergodic capacity and average rate-guaranteed scheduling for wireless multiuser OFDM systems

Xin Wang, Georgios B. Giannakis
2008 2008 IEEE International Symposium on Information Theory  
To accommodate channel uncertainties, stochastic subgradient iterations provide dual variables on line with guaranteed convergence to their off-line counterparts. *  ...  The result extends to accommodate fairness through general utility functions and constraints on the minimum average user rates.  ...  Ergodic capacity region (benchmark) and achieved rates with stochastic subgradient scheme for BE and nRT traffic.  ... 
doi:10.1109/isit.2008.4595276 dblp:conf/isit/0003G08 fatcat:e3pgkypl6nhg5czbhwbuaspzq4

GPU-based deep convolutional neural network for tomographic phase microscopy with ℓ1 fitting and regularization

Hui Qiao, Jiamin Wu
2018 Journal of Biomedical Optics  
Tomographic phase microscopy (TPM) is a unique imaging modality to measure the three-dimensional refractive index distribution of transparent and semitransparent samples.  ...  We compare our method with several state-of-the-art algorithms and obtain at least 14 dB improvement in signal-to-noise ratio.  ...  Kamilov, Alexandre Goy, and Demetri Psaltis for providing the code and their helpful suggestions.  ... 
doi:10.1117/1.jbo.23.6.066003 pmid:29905037 fatcat:ovrexrnmofbytmysmacudegzlu

Stabilized Sparse Online Learning for Sparse Data [article]

Yuting Ma, Tian Zheng
2017 arXiv   pre-print
Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems.  ...  To facilitate better convergence, we adopt an annealing strategy on the truncation rate, which leads to a balanced trade-off between exploration and exploitation in learning a sparse weight vector.  ...  It is possible to obtain a lower regret bound in expectation with adaptive learning rate η t decaying with t, such as η t = 1 √ t , which is commonly used in the literature of online learning and stochastic  ... 
arXiv:1604.06498v3 fatcat:ppuswbpn2jb25nkbvb6wvnsxam

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces [article]

Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere, Nicholas Jacek, Ian Gemp, Ji Liu
2014 arXiv   pre-print
Equally importantly, proximal operator theory enables the systematic development of operator splitting methods that show how to safely and reliably decompose complex products of gradients that occur in  ...  This key technical innovation makes it possible to finally design "true" stochastic gradient methods for reinforcement learning.  ...  Acknowledgements We like to acknowledge the useful feedback of past and present members of the Autonomous Learning Laboratory at the University of Massachusetts, Amherst.  ... 
arXiv:1405.6757v1 fatcat:u77kqc6iyncy7fixlnrfcnqrmy

Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses [chapter]

Qing Tao, Kang Kong, Dejun Chu, Gaowei Wu
2012 Lecture Notes in Computer Science  
In particular, we first present a principled and practical SCD algorithm for regularized smooth losses, in which the one-variable subproblem is solved using the proximal gradient method and the adaptive  ...  However, until now, there exists a gap between the convergence rate analysis and practical SCD algorithms for general smooth losses and there is no primal SCD algorithm for nonsmooth losses.  ...  Acknowledgments The work was supported in part by the NSFC (Grant No. 60835002, 60975040 and 61175050) and the first author is also supported by the Open Project Program of the NLPR.  ... 
doi:10.1007/978-3-642-33460-3_40 fatcat:oj35t52xtvh6pj4do77d7h7ltq

Max-Margin Deep Generative Models for (Semi-)Supervised Learning [article]

Chongxuan Li and Jun Zhu and Bo Zhang
2016 arXiv   pre-print
We develop an efficient doubly stochastic subgradient algorithm for the piecewise linear objectives in different settings.  ...  learning, mmDGMs are competitive to the best fully discriminative networks when employing convolutional neural networks as the generative and recognition models; and (3) in semi-supervised learning, mmDCGMs  ...  Although it is an adaptive gradient-based optimization method, we decay the global learning rate by a factor after sufficient number of epochs to ensure a stable convergence.  ... 
arXiv:1611.07119v1 fatcat:ktixkfzswjgbnkhpal5sfsif2u

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems [article]

Damek Davis, Benjamin Grimmer
2018 arXiv   pre-print
The primary contribution of this paper is a simple proof that the proposed algorithm converges at the same rate as the stochastic gradient method for smooth nonconvex problems.  ...  At a high-level, the method is an inexact proximal point iteration in which the strongly convex proximal subproblems are quickly solved with a specialized stochastic projected subgradient method.  ...  We thank Dmitriy Drusvyatskiy, George Lan, and the two anonymous reviewers for helpful comments.  ... 
arXiv:1707.03505v5 fatcat:2ehuflarxjguphqmora67vzine

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models [article]

Andrei Patrascu, Ciprian Paduraru, Paul Irofti
2020 arXiv   pre-print
In this chapter we develop and analyze minibatch variants of stochastic proximal gradient algorithm for general composite objective functions with stochastic nonsmooth components.  ...  Stochastic optimization lies at the core of most statistical learning models.  ...  Only recently the full stochastic composite models with stochastic regularizers have been properly tackled [28] , where almost sure asymptotic convergence is established for a stochastic splitting scheme  ... 
arXiv:2003.13332v1 fatcat:6gotcln22za4fly3jfuptg4liy

A Bregman Learning Framework for Sparse Neural Networks [article]

Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger
2022 arXiv   pre-print
We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach.  ...  Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network.  ...  of Science and Technology (BMBF) under grant agreement No. 05M2020 (DELETO).  ... 
arXiv:2105.04319v3 fatcat:tyiiilombrdybi7rkfjicgdscm

Error Feedback Fixes SignSGD and other Gradient Compression Schemes [article]

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, and Martin Jaggi
2019 arXiv   pre-print
We prove that our algorithm EF-SGD with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions.  ...  Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory and show that error-feedback improves both convergence and generalization. Code can be found at .  ...  ) where γ ∈ R is the step-size (or learning-rate) and g t is the stochastic gradient such that E[g t ] = ∇f (x t ).  ... 
arXiv:1901.09847v2 fatcat:2crhupyoizbnzkw4z5xewac4lm

Stochastic Composite Mirror Descent: Optimal Bounds with High Probabilities

Yunwen Lei, Ke Tang
2018 Neural Information Processing Systems  
We consider both convex and strongly convex objectives with non-smooth loss functions, for each of which we establish high-probability convergence rates optimal up to a logarithmic factor.  ...  We apply the derived computational error bounds to study the generalization performance of multi-pass stochastic gradient descent (SGD) in a non-parametric setting.  ...  Introduction Stochastic gradient descent (SGD) has found wide applications in machine learning problems due to its simplicity in implementation, low memory requirement and low computational complexity  ... 
dblp:conf/nips/LeiT18 fatcat:kg6eymbb3vayflicxahb62brua

Learning Structured Models for Segmentation of 2-D and 3-D Imagery

Aurelien Lucchi, Pablo Marquez-Neila, Carlos Becker, Yunpeng Li, Kevin Smith, Graham Knott, Pascal Fua
2015 IEEE Transactions on Medical Imaging  
This requires approximation that can lead to reduced quality of learning. In this article, we propose three novel techniques to overcome these limitations.  ...  Moreover, we employ a working set of constraints to increase the reliability of approximate subgradient methods and introduce a new way to select a suitable step size at each iteration.  ...  WORKING SETS We begin with a review of the stochastic subgradient method in the context of structured learning.  ... 
doi:10.1109/tmi.2014.2376274 pmid:25438309 fatcat:cuom2xgkvbexzowiaclyhl4b7m
« Previous Showing results 1 — 15 out of 814 results