Filters








1,540 Hits in 4.2 sec

Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning [article]

Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra
2020 arXiv   pre-print
Here, we extend the idea of iterative pre-conditioning to the stochastic settings, where the server updates the estimate and the iterative pre-conditioning matrix based on a single randomly selected data  ...  The recently proposed Iteratively Pre-conditioned Gradient-descent (IPG) method has been shown to converge faster than other existing distributed algorithms that solve this problem.  ...  In this paper, we propose a stochastic iterative pre-conditioning technique for improving the rate of convergence of the distributed stochastic gradient descent method when solving the linear least-squares  ... 
arXiv:2011.07595v2 fatcat:df4ib2dipjddvduqclnmsrj52i

Online Regularized Nonlinear Acceleration [article]

Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach
2019 arXiv   pre-print
Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method.  ...  ., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods.  ...  Edouard Oyallon was partially supported by a postdoctoral grant from DPEI of Inria (AAR 2017POD057) for the collaboration with CWI.  ... 
arXiv:1805.09639v2 fatcat:xzysmjgsrjafvhovqwuymdk7ga

Statistical inference for the population landscape via moment-adjusted stochastic gradients

Tengyuan Liang, Weijie J. Su
2019 Journal of The Royal Statistical Society Series B-statistical Methodology  
We establish non-asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees.  ...  Remarkably, the moment-adjusting idea motivated from "error standardization" in statistics achieves a similar effect as acceleration in first-order optimization methods used to fit generalized linear models  ...  Fig. 2 illustrates the acceleration for inference in logistics regression. The figure should be read the same way as in the linear case.  ... 
doi:10.1111/rssb.12313 fatcat:6xuoplsqerd33nv54zgimnmm54

Stochastic gradient descent methods for estimation with large data sets [article]

Dustin Tran, Panos Toulis, Edoardo M. Airoldi
2015 arXiv   pre-print
Our applications include the wide class of generalized linear models as well as M-estimation for robust regression.  ...  We demonstrate that sgd dominates alternative software in runtime for several estimation problems with massive data sets.  ...  Generalized linear models In the family of generalized linear models (GLMs), the outcome y n ∈ R follows an exponential family distribution conditional on x n , y n | x n ∼ exp 1 ψ (η n y n − b(η n ))  ... 
arXiv:1509.06459v1 fatcat:6mbwz7qi3beovn5bppoc3j4jaa

Distributed stochastic optimization and learning

Ohad Shamir, Nathan Srebro
2014 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton)  
We show how the best known guarantees are obtained by an accelerated mini-batched SGD approach, and contrast the runtime and sample costs of the approach with those of other distributed optimization algorithms  ...  objective w.r.t. the source distribution, minimizing: (1) overall runtime; (2) communication costs; (3) number of samples used.  ...  For example, we can perform squared-loss regression using (w; x, y) = ( x, w − y) 2 , perform logistic regression using (w; x, y) = log(1 + exp(−y x, w )), solve linear support vector machines using (w  ... 
doi:10.1109/allerton.2014.7028543 dblp:conf/allerton/ShamirS14 fatcat:3bizw6pnj5gjdcttmmrvv5653e

Optimization for deep learning: theory and algorithms [article]

Ruoyu Sun
2019 arXiv   pre-print
Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms.  ...  This article provides an overview of optimization algorithms and theory for training neural networks.  ...  Srikant, Tian Ding and Dawei Li for discussions on various results reviewed in this article.  ... 
arXiv:1912.08957v1 fatcat:bdtx2o3qhfhthh2vyohkuwnxxa

Accelerate RNN-based Training with Importance Sampling [article]

Fei Wang, Xiaofeng Gao, Guihai Chen, Jun Ye
2017 arXiv   pre-print
Unlike commonly adopted stochastic uniform sampling in stochastic optimizations, IS-integrated algorithms sample training data at each iteration with respect to a weighted sampling probability distribution  ...  Importance sampling (IS) as an elegant and efficient variance reduction (VR) technique for the acceleration of stochastic optimization problems has attracted many researches recently.  ...  Acknowledgements The authors thank Jason Ye, Professor Guihai Chen and Professor Xiaofeng Gao for their important helps.  ... 
arXiv:1711.00004v1 fatcat:msrstj25ybgajcolfn25zmvvom

ADMM-Softmax: an ADMM approach for multinomial logistic regression

Samy Wu Fung, Sanna Tyrväinen, Lars Ruthotto, Eldad Haber
2020 Electronic Transactions on Numerical Analysis  
We present ADMM-Softmax, an alternating direction method of multipliers (ADMM) for solving multinomial logistic regression (MLR) problems.  ...  In particular, each iteration of ADMM-Softmax consists of a linear least-squares problem, a set of independent small-scale smooth, convex problems, and a trivial dual variable update.  ...  We also thank the Isaac Newton Institute (INI) for Mathematical Sciences for the support and hospitality during the programme on generative models, parameter learning, and sparsity.  ... 
doi:10.1553/etna_vol52s214 fatcat:uuizj5tdrrht7apzppmldrue5e

Local SGD With a Communication Overhead Depending Only on the Number of Workers [article]

Artin Spiridonoff, Alex Olshevsky, Ioannis Ch. Paschalidis
2020 arXiv   pre-print
While the initial analysis of Local SGD showed it needs Ω ( √(T) ) communications for T local gradient steps in order for the error to scale proportionately to 1/(nT), this has been successively improved  ...  In this paper, we give a new analysis of Local SGD.  ...  Distributed training strategies for the structured perceptron.  ... 
arXiv:2006.02582v1 fatcat:zwknwwiepvgznlljlgebm7xnmu

Training Neural Networks with Stochastic Hessian-Free Optimization [article]

Ryan Kiros
2013 arXiv   pre-print
Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks.  ...  We modify Martens' HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting.  ...  Acknowledgments The author would like to thank Csaba Szepesvári for helpful discussion as well as David Sussillo for his guidance when first learning about and implementing HF.  ... 
arXiv:1301.3641v3 fatcat:yndfjyterneklcdlgqzekogzxy

Scaling-up Distributed Processing of Data Streams for Machine Learning [article]

Matthew Nokleby, Haroon Raja, Waheed U. Bajwa
2020 arXiv   pre-print
Further, these applications often involve data that are either inherently gathered at geographically distributed entities or that are intentionally distributed across multiple machines for memory, computational  ...  For such methods, the paper discusses recent advances in terms of distributed algorithmic designs when faced with high-rate streaming data.  ...  to binary linear classification using logistic regression.  ... 
arXiv:2005.08854v2 fatcat:y6fvajvq2naajeqs6lo3trrgwy

DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters [article]

Hanjoo Kim, Jaehong Park, Jaehee Jang, Sungroh Yoon
2016 arXiv   pre-print
To support parallel operations, DeepSpark automatically distributes workloads and parameters to Caffe/Tensorflow-running nodes using Spark, and iteratively aggregates training results by a novel lock-free  ...  Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge.  ...  We can select large τ to achieve speed-up for that condition.  ... 
arXiv:1602.08191v3 fatcat:jsqz7zgsq5ep3m7cfz5rcz5ntq

: A Fast sketching based solver for large scale ridge regression [article]

Nidham Gazagnadou, Mark Ibrahim, Robert M. Gower
2021 arXiv   pre-print
We propose new variants of the sketch-and-project method for solving large scale ridge regression problems.  ...  On the contrary, we show the subsampled Hadamard transform does not perform well in this setting, despite the use of fast Hadamard transforms, and nor do recently proposed acceleration schemes work well  ...  We thus recommend that, to advance the use of acceleration for linear systems, one would need to study a smaller class of problems and certain spectral bounds to derive a working rule for setting the parameters  ... 
arXiv:2105.05565v2 fatcat:kw47fxu66zc4jd6tw7nkmw5lpu

Batched Stochastic Gradient Descent with Weighted Sampling [article]

Deanna Needell, Rachel Ward
2017 arXiv   pre-print
We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions.  ...  We propose several computationally efficient schemes to approximate the optimal weights, and compute proposed sampling distributions explicitly for the least squares and hinge loss problems.  ...  ACKNOWLEDGEMENTS The authors would like to thank Anna Ma for helpful discussions about this paper, and the reviewers for their thoughtful feedback.  ... 
arXiv:1608.07641v2 fatcat:dtt2g3vr25fqth55xosa6lrugq

Batched Stochastic Gradient Descent with Weighted Sampling [chapter]

Deanna Needell, Rachel Ward
2017 Approximation Theory XV: San Antonio 2016  
We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions.  ...  We propose several computationally efficient schemes to approximate the optimal weights, and compute proposed sampling distributions explicitly for the least squares and hinge loss problems.  ...  ACKNOWLEDGEMENTS The authors would like to thank Anna Ma for helpful discussions about this paper. Needell was partially supported by NSF CAREER grant #1348721 and the Alfred P. Sloan Foundation.  ... 
doi:10.1007/978-3-319-59912-0_14 fatcat:jn6577t6bza6poiika2zswuddu
« Previous Showing results 1 — 15 out of 1,540 results