A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Variance-Reduced Stochastic Gradient Descent on Streaming Data
2018
Neural Information Processing Systems
better than prior algorithms suited for streaming data, such as SGD and SSVRG. ...
Our theoretical and experimental results show that the risk of STRSAGA is comparable to that of an offline algorithm on a variety of input arrival patterns, and its experimental performance is significantly ...
Stochastic Variance-Reduced Gradient (SVRG) [JZ13] is another variance reduction method that does not store the computed gradients, but periodically computes a full-data gradient, requiring more computation ...
dblp:conf/nips/JothimurugesanT18
fatcat:rg7nmveg6ndrhaauzmx7vxv7gy
Stochastic Optimization from Distributed, Streaming Data in Rate-limited Networks
[article]
2018
arXiv
pre-print
data streams. ...
The setup involves a network of nodes---each one of which has a stream of data arriving at a constant rate---that solve a stochastic convex optimization problem by collaborating with each other over rate-limited ...
As a baseline, we consider local (accelerated) stochastic mirror descent, in which nodes simply perform mirror descent on their own data streams without collaboration. ...
arXiv:1704.07888v4
fatcat:rt37gfzluzbbpnlvr447u4wb7a
Scaling-up Distributed Processing of Data Streams for Machine Learning
[article]
2020
arXiv
pre-print
When the streaming data rate is high compared to the processing capabilities of compute nodes and/or the rate of the communications links, this poses a challenging question: how can one best leverage the ...
This paper reviews recently developed methods that focus on large-scale distributed stochastic optimization in the compute- and bandwidth-limited regime, with an emphasis on convergence analysis that explicitly ...
A family of so-called variance-reduction methods [4] , [69] , [80] - [82] , such as stochastic variance reduced gradient (SVRG), stochastically controlled stochastic gradient (SCSG), and NATASHA, have ...
arXiv:2005.08854v2
fatcat:y6fvajvq2naajeqs6lo3trrgwy
Streaming Principal Component Analysis in Noisy Settings
2018
International Conference on Machine Learning
We study streaming algorithms for principal component analysis (PCA) in noisy settings. ...
We present computationally efficient algorithms with sub-linear regret bounds for PCA in the presence of noise, missing data, and gross outliers. ...
on data. ...
dblp:conf/icml/MarinovMA18
fatcat:bsvlozt6sjfdvojy2r4gcuy4d4
AdaOja: Adaptive Learning Rates for Streaming PCA
[article]
2019
arXiv
pre-print
This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori. ...
We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja's method. ...
This led us to consider common variants of stochastic gradient descent. In 2010 [20] and [9] introduced the AdaGrad update step for stochastic gradient descent for a single vector. ...
arXiv:1905.12115v2
fatcat:rgguexokqfd7pa3bb2e6bcmmpi
Stochastic algorithms with descent guarantees for ICA
[article]
2019
arXiv
pre-print
Second, the algorithm for the finite sum setting, although stochastic, guarantees a decrease of the loss function at each iteration. ...
We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits. ...
a stochastic gradient descent step. ...
arXiv:1805.10054v2
fatcat:bcd5grcpt5eefatok3rsshc3cu
Variance Reduced Stochastic Proximal Algorithm for AUC Maximization
[article]
2020
arXiv
pre-print
To combat this issue, several variance reduced methods have been proposed with faster convergence guarantees than vanilla stochastic gradient descent. ...
Stochastic Gradient Descent has been widely studied with classification accuracy as a performance measure. ...
Whereas in the case of SPAM algorithm, the variance of the gradient does not go to zero as it is a stochastic gradient descent based algorithm. We now present the proof of Theorem 1. ...
arXiv:1911.03548v2
fatcat:vsujhihsxnffvhfsspojb5ugzm
A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization
1992
Neural Information Processing Systems
A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology. ...
The method is based on the model-free distributed learning mechanism of Dembo and Kailath. ...
Yariv, and many other individuals for valuable suggestions and comments on the work presented here. ...
dblp:conf/nips/Cauwenberghs92
fatcat:cqe4uhdy4bfx3otvvxwcbkjbqy
A Linearly-Convergent Stochastic L-BFGS Algorithm
[article]
2016
arXiv
pre-print
Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and ...
We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to ...
Related Work There is a large body of work that attempts to improve on stochastic gradient descent by reducing variance. ...
arXiv:1508.02087v2
fatcat:qffeae2oufgzzhxcmfa6jgknze
Unbiased Online Recurrent Optimization
[article]
2017
arXiv
pre-print
Like NoBackTrack, UORO provides unbiased gradient estimates; unbiasedness is the core hypothesis in stochastic gradient descent theory, without which convergence to a local optimum is not guaranteed. ...
It works in a streaming fashion and avoids backtracking through past activations and inputs. ...
Vanilla stochastic gradient descent (SGD) and Adam are used hereafter. ...
arXiv:1702.05043v3
fatcat:t4v2jtynhvfyrjk2rfxhhc6elm
Efficient Convex Relaxations for Streaming PCA
2019
Neural Information Processing Systems
We revisit two algorithms, matrix stochastic gradient (MSG) and '2-regularized MSG (RMSG), that are instances of stochastic gradient descent (SGD) on a convex relaxation to principal component analysis ...
In this work, we give improved bounds on per iteration cost of mini-batched variants of both MSG and '2-RMSG and arrive at an algorithm with total computational complexity matching that of Oja's algorithm ...
One is based on the stochastic power method, also known as Oja's algorithm and is essentially stochastic gradient descent (SGD) on Problem 1 (De Sa et al., 2014; Hardt & Price, 2014; Balcan et al., 2016 ...
dblp:conf/nips/AroraM19
fatcat:k5tkqlnv7ng2diqyucqfypiade
On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
[article]
2019
arXiv
pre-print
The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. ...
SVR methods use control variates to reduce the variance of the traditional stochastic gradient descent (SGD) estimate f i (w) of the full gradient f (w). ...
variance of the stochastic gradient used by SGD. ...
arXiv:1812.04529v2
fatcat:3oa766rrf5agzjbt7xwfmu2jnm
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient
[article]
2020
arXiv
pre-print
Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). ...
To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic ...
Variance Reduced Deep Q-Learning The original stochastic gradient descent based on a single transition often hurts from the problem of high gradient estimation variances. ...
arXiv:2007.12817v1
fatcat:n4ta5mxeyjeulbilojziu7kvnq
Rapid Aerodynamic Shape Optimization Under Parametric and Turbulence Model Uncertainty: A Stochastic Gradient Approach
[article]
2021
arXiv
pre-print
To tackle this difficulty, we consider a variant of the stochastic gradient descent method where, in each optimization iteration, a stochastic approximation of the objective, constraints, and their gradients ...
With a cost that is a small factor larger than that of the deterministic approach, the stochastic gradient approach significantly improves the performance (mean and variance) of the aerodynamic design ...
Average Gradient (SAG) [37] , and Stochastic Variance Reduced Gradient (SVRG) [38] . ...
arXiv:2105.01048v1
fatcat:clgtzlt2lvbvfjv6o6vbh5m2ay
Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification
[article]
2018
arXiv
pre-print
In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and ...
This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD). ...
Rahul Kidambi thanks James Saunderson for useful discussions on matrix operator theory. ...
arXiv:1610.03774v4
fatcat:7gzhgqawanbpndi4ztzipfktni
« Previous
Showing results 1 — 15 out of 7,019 results