Filters








27 Hits in 7.7 sec

Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction [article]

Bin Gu and Zhouyuan Huo and Heng Huang
2016 arXiv   pre-print
The convergence rate of existing asynchronous doubly stochastic zeroth order algorithms is O(1/√(T)) (also for the sequential stochastic zeroth-order optimization algorithms).  ...  To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed.  ...  Algorithm 2 Asynchronous Doubly Stochastic Zeroth-order Optimization with Variance Reduction (AsyDSZOVR) Input: γ, S, and m.  ... 
arXiv:1612.01425v1 fatcat:2d5byqvsyrevxizofshwvdu67e

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization [article]

Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang
2019 arXiv   pre-print
To fill this gap, in the paper, we propose a class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA, which are denoted as ZO-ProxSVRG and ZO-ProxSAGA  ...  Recently, the first zeroth-order proximal stochastic algorithm was proposed to solve the nonconvex nonsmooth problems.  ...  Zeroth-order asynchronous doubly stochastic algorithm with variance reduction. arXiv preprint arXiv:1612.01425. Gu, B.; Huo, Z.; and Huang, H. 2018.  ... 
arXiv:1902.06158v1 fatcat:eotzjizo6jbdfcjrcpr56mdbpi

Stochastic Zeroth-order Optimization via Variance Reduction method [article]

Liu Liu, Minhao Cheng, Cho-Jui Hsieh, Dacheng Tao
2018 arXiv   pre-print
In this paper, we introduce a novel Stochastic Zeroth-order method with Variance Reduction under Gaussian smoothing (SZVR-G) and establish the complexity for optimizing non-convex problems.  ...  With variance reduction on both sample space and search space, the complexity of our algorithm is sublinear to d and is strictly better than current approaches, in both smooth and non-smooth cases.  ...  [19] apply variance reduction of zeroth-order to asynchronous doubly stochastic algorithm, however, without the specific analysis of the complexity related to dimension d.  ... 
arXiv:1805.11811v3 fatcat:pry6ktdi6vdhzfve4sy6patb2e

Asynchronous Stochastic Block Coordinate Descent with Variance Reduction [article]

Bin Gu, Zhouyuan Huo, Heng Huang
2016 arXiv   pre-print
We propose an asynchronous stochastic block coordinate descent algorithm with the accelerated technology of variance reduction (AsySBCDVR), which are with lock-free in the implementation and analysis.  ...  Asynchronous parallel implementations for stochastic optimization have received huge successes in theory and practice recently.  ...  Lian et al. (2016) proposed an asynchronous stochastic optimization algorithm with zeroth order and proved the convergence.  ... 
arXiv:1610.09447v3 fatcat:2hqrb3rfgbdo7g5afyo6n62hjq

2020 Index IEEE Transactions on Signal Processing Vol. 68

2020 IEEE Transactions on Signal Processing  
., One-Step Prediction for Discrete Time-Varying Nonlinear Systems With Unknown Inputs and Correlated Noises; TSP  ...  ., +, TSP 2020 1897-1909 Fast Optimization With Zeroth-Order Feedback in Distributed, Multi-User MIMO Systems.  ...  Bai, Y., +, TSP 2020 2419-2434 Fast Optimization With Zeroth-Order Feedback in Distributed, Multi-User MIMO Systems.  ... 
doi:10.1109/tsp.2021.3055469 fatcat:6uswtuxm5ba6zahdwh5atxhcsy

ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization [article]

Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox
2019 arXiv   pre-print
In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime.  ...  We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of O(√(d)) worse than that of the first-order AdaMM algorithm, where d is problem size.  ...  Huang, "Zeroth-order asynchronous doubly stochastic algorithm with variance reduction," arXiv preprint arXiv:1612.01425, 2016. [26] L. Liu, M. Cheng, C.-J. Hsieh, and D.  ... 
arXiv:1910.06513v2 fatcat:52nisjhw2fbrlewl2dh25rsg6a

Stochastic Subspace Descent [article]

David Kozak, Stephen Becker, Alireza Doostan, Luis Tenorio
2019 arXiv   pre-print
We also note that our analysis gives a proof that the iterates of SVRG and several other popular first-order stochastic methods, in their original formulation, converge almost surely to the optimum; to  ...  We present two stochastic descent algorithms that apply to unconstrained optimization and are particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained  ...  random chance than variance reduction.  ... 
arXiv:1904.01145v2 fatcat:47lconmx3rbspho2uhdqaawc7e

Efficiently avoiding saddle points with zero order methods: No gradients required [article]

Lampros Flokas, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Georgios Piliouras
2019 arXiv   pre-print
We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients.  ...  Regarding efficiency, we introduce a noisy zero-order method that converges to second order stationary points, i.e avoids saddle points.  ...  Zeroth-order asynchronous doubly stochastic algorithm with variance reduction. arXiv preprint arXiv:1612.01425, 2016. Davood Hajinezhad and Michael M. Zavlanos.  ... 
arXiv:1910.13021v1 fatcat:bbv2t3or3vapdclptz2qwwougi

AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization [article]

Qingsong Zhang, Bin Gu, Cheng Deng, Songxiang Gu, Liefeng Bo, Jian Pei, Heng Huang
2021 arXiv   pre-print
To address the challenges of communication and computation resource utilization, we propose an asynchronous stochastic quasi-Newton (AsySQN) framework for VFL, under which three algorithms, i.e.  ...  Extensive numerical experiments on real-word datasets demonstrate the lower communication costs and better computation resource utilization of our algorithms compared with state-of-the-art VFL algorithms  ...  Different from AsySQN-SGD directly using the stochastic gradient for updating, AsySQN-SVRG adopts the variance reduction technique to control the intrinsic variance of the stochastic gradient estimator  ... 
arXiv:2109.12519v1 fatcat:q6m5h4aylvhmxg7pqq4gndiw54

On the Convergence of Quantized Parallel Restarted SGD for Central Server Free Distributed Training [article]

Feijie Wu, Shiqi He, Yutong Yang, Haozhao Wang, Zhihao Qu, Song Guo, Weihua Zhuang
2020 arXiv   pre-print
Under both aggregation paradigms, the algorithm achieves the linear speedup property with respect to the number of local updates as well as the number of workers.  ...  ), an algorithm that allows multiple local SGD updates before a global synchronization, in synergy with the quantization to significantly reduce the communication overhead.  ...  Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341-2368 (2013) 8.  ... 
arXiv:2004.09125v2 fatcat:ckwupmf64na7th263f6zjrrffu

Detection of stochastic processes

T. Kailath, H.V. Poor
1998 IEEE Transactions on Information Theory  
This treatment deals exclusively with basic results developed for the situation in which the observations are modeled as continuous-time stochastic processes.  ...  ) Poisson process; when is stochastic, is often called a "doubly stochastic" Poisson process.  ...  So, in order for optimal detection to be practical in such situations, some form of complexity reduction is necessary.  ... 
doi:10.1109/18.720538 fatcat:mghnncnlyvgnbgoamhtag67e3i

Detection of Stochastic Processes [chapter]

2009 Information Theory  
This treatment deals exclusively with basic results developed for the situation in which the observations are modeled as continuous-time stochastic processes.  ...  ) Poisson process; when is stochastic, is often called a "doubly stochastic" Poisson process.  ...  So, in order for optimal detection to be practical in such situations, some form of complexity reduction is necessary.  ... 
doi:10.1109/9780470544907.ch9 fatcat:kdz3v5okgrcjlosmard4ngq4xy

Statistical embedding: Beyond principal components [article]

Dag Tjøstheim and Martin Jullum and Anders Løland
2021 arXiv   pre-print
Another type of data sets with a tremendous growth is very high-dimensional network data.  ...  The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams.  ...  The zeroth order homology of a set corresponds to its connected components.  ... 
arXiv:2106.01858v1 fatcat:4vwj5epnkfapxhgeybhkzxv47a

A high-bias, low-variance introduction to Machine Learning for physicists [article]

Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre G.R. Day, Clint Richardson, Charles K. Fisher, David J. Schwab
2019 arXiv   pre-print
The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to  ...  We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists may be able to contribute  ...  Stochastic Gradient Descent (SGD) with mini-batches One of the most widely-applied variants of the gradient descent algorithm is stochastic gradient descent (SGD) (Bottou, 2012; Williams and Hinton, 1986  ... 
arXiv:1803.08823v2 fatcat:vmtp62jyvjfxhpidpdcozfnza4

Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

Julia Kreutzer
2020
Methods for variance reduction of the stochastic gradient will be presented in the following section.  ...  Variance Reduction by Control Variates One issue with vanilla policy gradient is that its stochastic gradient updates suffer high variance, since they involve sampling from the model distribution, which  ...  B.2 NMT Hyperparameters (Ch. 4) The out-of-domain model in Chapter 4 is trained with mini-batches of size 100 and L2 regularization with weight 1 × 10 −7 , optimized with Adam (Kingma and Ba, 2015) with  ... 
doi:10.11588/heidok.00028862 fatcat:jrsiseo4prf4pa3f7nnbp24wkq
« Previous Showing results 1 — 15 out of 27 results