7,019 Hits in 4.3 sec

Variance-Reduced Stochastic Gradient Descent on Streaming Data

Ellango Jothimurugesan, Ashraf Tahmasbi, Phillip B. Gibbons, Srikanta Tirthapura
2018 Neural Information Processing Systems  
better than prior algorithms suited for streaming data, such as SGD and SSVRG.  ...  Our theoretical and experimental results show that the risk of STRSAGA is comparable to that of an offline algorithm on a variety of input arrival patterns, and its experimental performance is significantly  ...  Stochastic Variance-Reduced Gradient (SVRG) [JZ13] is another variance reduction method that does not store the computed gradients, but periodically computes a full-data gradient, requiring more computation  ... 
dblp:conf/nips/JothimurugesanT18 fatcat:rg7nmveg6ndrhaauzmx7vxv7gy

Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks [article]

Matthew Nokleby, Waheed U. Bajwa
2018 arXiv   pre-print
data streams.  ...  The setup involves a network of nodes---each one of which has a stream of data arriving at a constant rate---that solve a stochastic convex optimization problem by collaborating with each other over rate-limited  ...  As a baseline, we consider local (accelerated) stochastic mirror descent, in which nodes simply perform mirror descent on their own data streams without collaboration.  ... 
arXiv:1704.07888v4 fatcat:rt37gfzluzbbpnlvr447u4wb7a

Scaling-up Distributed Processing of Data Streams for Machine Learning [article]

Matthew Nokleby, Haroon Raja, Waheed U. Bajwa
2020 arXiv   pre-print
When the streaming data rate is high compared to the processing capabilities of compute nodes and/or the rate of the communications links, this poses a challenging question: how can one best leverage the  ...  This paper reviews recently developed methods that focus on large-scale distributed stochastic optimization in the compute- and bandwidth-limited regime, with an emphasis on convergence analysis that explicitly  ...  A family of so-called variance-reduction methods [4] , [69] , [80] - [82] , such as stochastic variance reduced gradient (SVRG), stochastically controlled stochastic gradient (SCSG), and NATASHA, have  ... 
arXiv:2005.08854v2 fatcat:y6fvajvq2naajeqs6lo3trrgwy

Streaming Principal Component Analysis in Noisy Settings

Teodor Vanislavov Marinov, Poorya Mianjy, Raman Arora
2018 International Conference on Machine Learning  
We study streaming algorithms for principal component analysis (PCA) in noisy settings.  ...  We present computationally efficient algorithms with sub-linear regret bounds for PCA in the presence of noise, missing data, and gross outliers.  ...  on data.  ... 
dblp:conf/icml/MarinovMA18 fatcat:bsvlozt6sjfdvojy2r4gcuy4d4

AdaOja: Adaptive Learning Rates for Streaming PCA [article]

Amelia Henriksen, Rachel Ward
2019 arXiv   pre-print
This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori.  ...  We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja's method.  ...  This led us to consider common variants of stochastic gradient descent. In 2010 [20] and [9] introduced the AdaGrad update step for stochastic gradient descent for a single vector.  ... 
arXiv:1905.12115v2 fatcat:rgguexokqfd7pa3bb2e6bcmmpi

Stochastic algorithms with descent guarantees for ICA [article]

Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso and Francis Bach
2019 arXiv   pre-print
Second, the algorithm for the finite sum setting, although stochastic, guarantees a decrease of the loss function at each iteration.  ...  We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.  ...  a stochastic gradient descent step.  ... 
arXiv:1805.10054v2 fatcat:bcd5grcpt5eefatok3rsshc3cu

Variance Reduced Stochastic Proximal Algorithm for AUC Maximization [article]

Soham Dan, Dushyant Sahoo
2020 arXiv   pre-print
To combat this issue, several variance reduced methods have been proposed with faster convergence guarantees than vanilla stochastic gradient descent.  ...  Stochastic Gradient Descent has been widely studied with classification accuracy as a performance measure.  ...  Whereas in the case of SPAM algorithm, the variance of the gradient does not go to zero as it is a stochastic gradient descent based algorithm. We now present the proof of Theorem 1.  ... 
arXiv:1911.03548v2 fatcat:vsujhihsxnffvhfsspojb5ugzm

A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization

Gert Cauwenberghs
1992 Neural Information Processing Systems  
A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology.  ...  The method is based on the model-free distributed learning mechanism of Dembo and Kailath.  ...  Yariv, and many other individuals for valuable suggestions and comments on the work presented here.  ... 
dblp:conf/nips/Cauwenberghs92 fatcat:cqe4uhdy4bfx3otvvxwcbkjbqy

A Linearly-Convergent Stochastic L-BFGS Algorithm [article]

Philipp Moritz, Robert Nishihara, Michael I. Jordan
2016 arXiv   pre-print
Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and  ...  We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to  ...  Related Work There is a large body of work that attempts to improve on stochastic gradient descent by reducing variance.  ... 
arXiv:1508.02087v2 fatcat:qffeae2oufgzzhxcmfa6jgknze

Unbiased Online Recurrent Optimization [article]

Corentin Tallec, Yann Ollivier
2017 arXiv   pre-print
Like NoBackTrack, UORO provides unbiased gradient estimates; unbiasedness is the core hypothesis in stochastic gradient descent theory, without which convergence to a local optimum is not guaranteed.  ...  It works in a streaming fashion and avoids backtracking through past activations and inputs.  ...  Vanilla stochastic gradient descent (SGD) and Adam are used hereafter.  ... 
arXiv:1702.05043v3 fatcat:t4v2jtynhvfyrjk2rfxhhc6elm

Efficient Convex Relaxations for Streaming PCA

Raman Arora, Teodor Vanislavov Marinov
2019 Neural Information Processing Systems  
We revisit two algorithms, matrix stochastic gradient (MSG) and '2-regularized MSG (RMSG), that are instances of stochastic gradient descent (SGD) on a convex relaxation to principal component analysis  ...  In this work, we give improved bounds on per iteration cost of mini-batched variants of both MSG and '2-RMSG and arrive at an algorithm with total computational complexity matching that of Oja's algorithm  ...  One is based on the stochastic power method, also known as Oja's algorithm and is essentially stochastic gradient descent (SGD) on Problem 1 (De Sa et al., 2014; Hardt & Price, 2014; Balcan et al., 2016  ... 
dblp:conf/nips/AroraM19 fatcat:k5tkqlnv7ng2diqyucqfypiade

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning [article]

Aaron Defazio, Léon Bottou
2019 arXiv   pre-print
The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success.  ...  SVR methods use control variates to reduce the variance of the traditional stochastic gradient descent (SGD) estimate f i (w) of the full gradient f (w).  ...  variance of the stochastic gradient used by SGD.  ... 
arXiv:1812.04529v2 fatcat:3oa766rrf5agzjbt7xwfmu2jnm

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [article]

Haonan Jia, Xiao Zhang, Jun Xu, Wei Zeng, Hao Jiang, Xiaohui Yan, Ji-Rong Wen
2020 arXiv   pre-print
Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019).  ...  To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic  ...  Variance Reduced Deep Q-Learning The original stochastic gradient descent based on a single transition often hurts from the problem of high gradient estimation variances.  ... 
arXiv:2007.12817v1 fatcat:n4ta5mxeyjeulbilojziu7kvnq

Rapid Aerodynamic Shape Optimization Under Parametric and Turbulence Model Uncertainty: A Stochastic Gradient Approach [article]

Lluís Jofre, Alireza Doostan
2021 arXiv   pre-print
To tackle this difficulty, we consider a variant of the stochastic gradient descent method where, in each optimization iteration, a stochastic approximation of the objective, constraints, and their gradients  ...  With a cost that is a small factor larger than that of the deterministic approach, the stochastic gradient approach significantly improves the performance (mean and variance) of the aerodynamic design  ...  Average Gradient (SAG) [37] , and Stochastic Variance Reduced Gradient (SVRG) [38] .  ... 
arXiv:2105.01048v1 fatcat:clgtzlt2lvbvfjv6o6vbh5m2ay

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification [article]

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
2018 arXiv   pre-print
In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and  ...  This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD).  ...  Rahul Kidambi thanks James Saunderson for useful discussions on matrix operator theory.  ... 
arXiv:1610.03774v4 fatcat:7gzhgqawanbpndi4ztzipfktni
« Previous Showing results 1 — 15 out of 7,019 results