8,905 Hits in 3.4 sec

Accelerated Variance Reduced Block Coordinate Descent [article]

Zebang Shen, Hui Qian, Chao Zhang, Tengfei Zhou
2016 arXiv   pre-print
In this paper, we propose a method enjoying all these merits with an accelerated convergence rate O(1/k^2).  ...  In this work, we propose a method called Accelerated Variance Reduced Block Coordinate Descent (AVRBCD) that tackles the two challenges in large scale problem.  ...  Conclusion In this paper, we proposed an accelerated variance reduced block coordinate descent algorithm that can handle problems with large number of samples in ultra-high dimensional space.  ... 
arXiv:1611.04149v1 fatcat:s33ruopcffeszfvfs3uzsrzyma

Efficient Asynchronous Semi-stochastic Block Coordinate Descent Methods for Large-Scale SVD

Fanhua Shang, Zhihui Zhang, Yuanyuan Liu, Hongying Liua, Jing Xu
2021 IEEE Access  
By taking full advantage of both variance reduction and randomized coordinate descent techniques, this paper proposes a novel Semi-stochastic Block Coordinate Descent algorithm (SBCD-SVD), which is more  ...  Moreover, we propose a new Asynchronous parallel Semi-stochastic Block Coordinate Descent algorithm (ASBCD-SVD) and one new Asynchronous parallel Sparse approximated Variance Reduction algorithm (ASVR-SVD  ...  Therefore, this paper will propose one new efficient accelerated semi-stochastic coordinate descent method based on the variance reduction technique [22] , which has a faster convergence speed than existing  ... 
doi:10.1109/access.2021.3094282 fatcat:z6ptyfacpjer5lajuu4sx7doye

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization

Jinghui Chen, Quanquan Gu
2016 Conference on Uncertainty in Artificial Intelligence  
We propose an accelerated stochastic block coordinate descent algorithm for nonconvex optimization under sparsity constraint in the high dimensional regime.  ...  The core of our algorithm is leveraging both stochastic partial gradient and full partial gradient restricted to each coordinate block to accelerate the convergence.  ...  Li et al. (2016) proposed a stochastic variance reduced gradient hard thresholding algorithm. Nevertheless, it cannot leverage the coordinate block to accelerate the convergence.  ... 
dblp:conf/uai/ChenG16 fatcat:a4bf2izbeffgzcq4azxxtid6qu

99% of Worker-Master Communication in Distributed Optimization Is Not Needed

Konstantin Mishchenko, Filip Hanzely, Peter Richtárik
2020 Conference on Uncertainty in Artificial Intelligence  
Namely, we develop a new variant of parallel block coordinate descent based on independent sparsification of the local gradient estimates before communication.  ...  We demonstrate that with only m/n blocks sent by each of n workers, where m is the total number of parameter blocks, the theoretical iteration complexity of the underlying distributed methods is essentially  ...  If, however, the noise is already tiny, as, in non-accelerated variance reduced methods, there is no improvement.  ... 
dblp:conf/uai/MishchenkoHR20 fatcat:qcr5usi3ujbp7fnvrn2lqu5dni

Accelerated Mini-batch Randomized Block Coordinate Descent Method

Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu
2014 Advances in Neural Information Processing Systems  
We further accelerate the MRBCD method by exploiting the semi-stochastic optimization scheme, which effectively reduces the variance of the partial gradient estimators.  ...  When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.  ...  reduced gradient method (SPVRG) [20] , and the "batch" randomized block coordinate descent (BRBCD) method [12] .  ... 
pmid:25620860 pmcid:PMC4303186 fatcat:cs4oc7pxmfduzlihb7o5pxhabi

99 Fix it [article]

Konstantin Mishchenko and Filip Hanzely and Peter Richtárik
2019 arXiv   pre-print
Namely, we develop a new variant of parallel block coordinate descent based on independent sparsification of the local gradient estimates before communication.  ...  It is also well known that many such methods, including SGD, SAGA, and accelerated SGD for over-parameterized models, do not scale well with the number of parallel workers.  ...  On the other hand, (uniform) block coordinate descent (CD) has variance proportional to 1 τ , where τ < 1 is the ratio of used blocks.  ... 
arXiv:1901.09437v2 fatcat:up2xhbyfojhgndvomvuhq3qxd4

Semi-stochastic coordinate descent

Jakub Konečný, Zheng Qu, Peter Richtárik
2017 Optimization Methods and Software  
References [ 1 ] 1 Konečný J., Qu Z., Richtárik P.: Semi-Stochastic Coordinate Descent, OPT 2014 @ NIPS [2] Johnson R., Zhang T.: Accelerating Stochastic Gradient Descent using Predictive Variance Reduction  ...  (nonoverlapping) blocks of coordinates.  ... 
doi:10.1080/10556788.2017.1298596 fatcat:7qst4matjfd2dmufecshcsfaem

Accelerated Stochastic Block Coordinate Descent with Optimal Sampling

Aston Zhang, Quanquan Gu
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
We propose an accelerated stochastic block coordinate descent (ASBCD) algorithm, which incorporates the incrementally averaged partial derivative into the stochastic partial derivative and exploits optimal  ...  This type of composite optimization is common in many data mining and machine learning problems, and can be solved by block coordinate descent algorithms.  ...  We propose an algorithm for stochastic block coordinate descent using optimal sampling, namely accelerated stochastic block coordinate descent with optimal sampling (ASBCD).  ... 
doi:10.1145/2939672.2939819 dblp:conf/kdd/ZhangG16 fatcat:sl3vfm4lsndsnhelknh54v6vwa

Randomized Block Coordinate Descent for Online and Stochastic Optimization [article]

Huahua Wang, Arindam Banerjee
2014 arXiv   pre-print
In this paper, we combine the two types of methods together and propose online randomized block coordinate descent (ORBCD).  ...  For strongly convex functions, by reducing the variance of stochastic gradients, we show that ORBCD can converge at a geometric rate in expectation, matching the convergence rate of SGD with variance reduction  ...  To accelerate the SGD by reducing the variance of stochastic gradient, stochastic variance reduced gradient (SVRG) was proposed by [13] .  ... 
arXiv:1407.0107v3 fatcat:mt7eearbobawlmabvkpjbv5emm

Asynchronous Delay-Aware Accelerated Proximal Coordinate Descent for Nonconvex Nonsmooth Problems

Ehsan Kazemi, Liqiang Wang
On the other hand, the asynchronous proximal coordinate descent (APCD) recently have received much attention in order to solve large-scale problems.  ...  Proximal coordinate descent (PCD) has been widely used for solving optimization problems, but the knowledge of PCD methods in the nonconvex setting is very limited.  ...  Under the assumption that the regularization term is block separable, (Richtárik and Takáč 2014) developed a randomized block-coordinate descent method.  ... 
doi:10.1609/aaai.v33i01.33011528 fatcat:weexm7jahbapnne5kkieox3k44

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification [article]

Runxue Bao, Bin Gu, Heng Huang
2022 arXiv   pre-print
To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations  ...  Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem.  ...  Further, accelerated mini-batch randomized block coordinate descent (MRBCD) method [46, 38] was proposed to achieve a linear convergence rate by reducing the gradient variance [20] .  ... 
arXiv:2208.06058v1 fatcat:5befn2ax2zh5lhd43ankjrhney

Large-Scale Optimization Algorithms for Sparse Conditional Gaussian Graphical Models [article]

Calvin McCarter, Seyoung Kim
2015 arXiv   pre-print
We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence.  ...  In order to reduce cache misses, we perform block coordinate descent, where within each block, the columns of Σ are cached and re-used.  ...  We propose a block coordinate descent approach for solving Eq. (7) that groups these computations to reduce cache misses.  ... 
arXiv:1509.04681v2 fatcat:dzrch7lfo5adhbtnmxk5ul3rey

A principled framework for the design and analysis of token algorithms [article]

Hadrien Hendrikx
2022 arXiv   pre-print
We frame the token algorithm as a randomized gossip algorithm on a conceptual graph, which allows us to prove a series of convergence results for variance-reduced and accelerated token algorithms for the  ...  Theorem 4 (Token Accelerated Variance-Reduced).  ...  ., 2017] , and in particular , and so the algorithmic core similar, namely Bregman coordinate descent (with some adaptations) for the simple and variance-reduced algorithms, and Accelerated Proximal Coordinate  ... 
arXiv:2205.15015v1 fatcat:lgyy2yngkfd33le4xr3b4i5qbm

Oracle Complexity Separation in Convex Optimization [article]

Anastasiya Ivanova, Evgeniya Vorontsova, Dmitry Pasechnyuk, Alexander Gasnikov, Pavel Dvurechensky, Darina Dvinskikh, Alexander Tyurin
2022 arXiv   pre-print
In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation.  ...  Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different  ...  as M inn , we obtain accelerated gradient method, accelerated random coordinate descent and accelerated stochastic variance reduced method with oracle complexity separation.  ... 
arXiv:2002.02706v4 fatcat:wjl4kv6jlfdd3jgx4x6nkoehey

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes [article]

Shuai Zheng and Haibin Lin and Sheng Zha and Mu Li
2020 arXiv   pre-print
In this paper, we propose an accelerated gradient method called LANS to improve the efficiency of using large mini-batches for training.  ...  Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to reduce the training time.  ...  For training deep neural network, the momentum method accelerates early optimization and helps gradient descent method escape from the local minimums.  ... 
arXiv:2006.13484v2 fatcat:zekvbyfgcrdk7g6aehj2h4zvni
« Previous Showing results 1 — 15 out of 8,905 results