Filters








24 Hits in 4.9 sec

Coordinate descent with arbitrary sampling II: expected separable overapproximation

Zheng Qu, Peter Richtárik
2016 Optimization Methods and Software  
The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion  ...  In this paper we develop a systematic technique for deriving these inequalities for a large class of functions and for arbitrary samplings.  ...  Traditional variants of coordinate descent rely on cyclic or greedy rules for the selection of the next coordinate to be updated.  ... 
doi:10.1080/10556788.2016.1190361 fatcat:bvueahrt4fa7rox4bf6d5ehdmi

An Exact Solver for the Weston-Watkins SVM Subproblem [article]

Yutong Wang, Clayton D. Scott
2021 arXiv   pre-print
For linear WW-SVMs, our solver shows significant speed-up over the state-of-the-art solver when the number of classes is large.  ...  Recent empirical evidence suggests that the Weston-Watkins support vector machine is among the best performing multiclass extensions of the binary SVM.  ...  subproblem (S1) using a form of greedy coordinate descent.  ... 
arXiv:2102.05640v2 fatcat:dvdovfezkbfy3fsl3o23fjwfuq

Adaptive Kernel Learning in Heterogeneous Networks [article]

Hrusikesha Pradhan, Amrit Singh Bedi, Alec Koppel, Ketan Rajawat
2021 arXiv   pre-print
To solve the constrained stochastic program, we propose applying a functional variant of stochastic primal-dual (Arrow-Hurwicz) method which yields a decentralized algorithm.  ...  To incentivize coordination while respecting network heterogeneity, we impose nonlinear proximity constraints.  ...  /dual descent/ascent steps on the Lagrangian.  ... 
arXiv:1908.00510v4 fatcat:2alqlh56vzbivabunuh2tkhi24

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization [article]

Gesualdo Scutari, Ying Sun
2018 arXiv   pre-print
Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems.  ...  The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization models and algorithms.  ...  III.4 for more details and some numerical results. Example #1−(Sparse) empirical risk minimization: In Example #5 in Sec.  ... 
arXiv:1805.06963v1 fatcat:fbjziifyezdixoudqgi2sbfpum

Batch Policy Learning under Constraints [article]

Hoang M. Le, Cameron Voloshin, Yisong Yue
2019 arXiv   pre-print
Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving  ...  We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints.  ...  The empirical primal dual gap L max − L min in Figure 1 (left) quickly decreases toward the optimal gap of zero.  ... 
arXiv:1903.08738v1 fatcat:6rydrj3xcjgylmqvb5sbq72rey

14th International Symposium on Mathematical Programming

1990 Mathematical programming  
Finally I report numerical results of a comparison between the variant and a minimization approach using penalty functions.  ...  It is shown that for the success of the variant dom must ful ll a regularity property and that the choice of the normal vectors must meet some demands.Both requirements are ful lled if dom is polyhedral  ...  This special model includes the models of, e:g:, Edmonds, Fujishige, Queyranne, Spieksma and Tardella for primal or dual i:e:, Monge-type" greedy algorithms.  ... 
doi:10.1007/bf01580875 fatcat:3jtclwmntzgjxkqs5uecombdaa

Provably effective algorithms for min-max optimization

Lei, Qi, 1992-, 0000-0003-0634-6435, Austin, The University Of Texas At, Inderjit S. Dhillon, Alexandros G. Dimakis
2020
Can we show that stochastic gradient descent-ascent, a method commonly used in practice for GAN training, actually finds global optima and can learn a target distribution?  ...  We also present extensive empirical studies to verify the effectiveness of our proposed methods.  ...  The idea is simple: for primal variable x, we conduct block Frank-Wolfe or greedy coordinate descent respectively for constrained and unconstrained cases.  ... 
doi:10.26153/tsw/10153 fatcat:ylryreuz35bt3db4naamj57f3m

Implementation of Fog computing for reliable E-health applications

Razvan Craciunescu, Albena Mihovska, Mihail Mihaylov, Sofoklis Kyriazakos, Ramjee Prasad, Simona Halunga
2015 2015 49th Asilomar Conference on Signals, Systems and Computers  
Specifically, we introduce a concept for sparse joint activity, channel and data detection in the context of the Coded ALOHA (FDMA) protocol.  ...  We will mathematically analyze the system accordingly and provide expressions for the capture probabilities of the underlying sparse multiuser detector.  ...  Our algorithm relies on a primal-dual splitting strategy that avoids to invert any linear operator, thus making it suitable for processing high-dimensional datasets.  ... 
doi:10.1109/acssc.2015.7421170 dblp:conf/acssc/CraciunescuMMKP15 fatcat:qm6mki5z6bcvrfimkmqjyrxaxm

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn.  ...  The authors present an implementation with centralized training for decentralized execution, as discussed below. The authors experiment with grid world coordination, a partially observable game,  ...  response oracle (PSRO), and its approximation, deep cognitive hierarchies (DCH), to compute best responses to a mixture of policies using deep RL, and to compute new meta-strategy distributions using empirical  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

Graph Spectral Image Processing [article]

Gene Cheung, Enrico Magli, Yuichi Tanaka, Michael Ng
2018 arXiv   pre-print
In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing.  ...  can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for  ...  In [97] , for piecewise smooth (PWS) images, the authors minimize the total variation of edge weights in a dual graph.  ... 
arXiv:1801.04749v2 fatcat:emorqmvkinf2tnaccvup3ot4fi

Ping-pong beam training for reciprocal channels with delay spread

Elisabeth de Carvalho, Jorgen Bach Andersen
2015 2015 49th Asilomar Conference on Signals, Systems and Computers  
Specifically, we introduce a concept for sparse joint activity, channel and data detection in the context of the Coded ALOHA (FDMA) protocol.  ...  We will mathematically analyze the system accordingly and provide expressions for the capture probabilities of the underlying sparse multiuser detector.  ...  Our algorithm relies on a primal-dual splitting strategy that avoids to invert any linear operator, thus making it suitable for processing high-dimensional datasets.  ... 
doi:10.1109/acssc.2015.7421451 dblp:conf/acssc/CarvalhoA15 fatcat:mqokuvnh3zg45licnfbgxyvxfu

Introduction to Online Convex Optimization [article]

Elad Hazan
2021 arXiv   pre-print
I am thankful to Shay Moran for explaining compression schemes and how they simplify generalization for boosting.  ...  I am very thankful to Wouter Koolen-Wijkstra for a helpful suggestion in the analysis of the Online Newton Step algorithm.  ...  The first part of this equivalence is shown by representing a zero-sum game as a primal-dual linear program instance, as we do now.  ... 
arXiv:1909.05207v2 fatcat:2fvtye3uo5dajjy4hfdwk33lbq

Effective and Efficient Learning at Scale

Wei Yu
2020
For example, 1) a new model that can enableparallel computation helps accelerate both training and inference; 2) a fast algorithm can save time for hyper-parameter tuning and/or make it affordable for  ...  It was also the deepest neural network model for NLP when invented. The second part pr [...]  ...  Randomized primal-dual coordinate methods: In Chapter 5 and 6, we propose two primal-dual algorithms for the classical empirical risk minimization problem.  ... 
doi:10.1184/r1/11898309 fatcat:3h2fz5jf6zc6jjli2vp5yalb7a

Statistical and Algorithm Aspects of Optimal Portfolios

Howard Howan Stephen Shek
2010 Social Science Research Network  
An empirical application with DJIA stocks and an exchange traded index fund shows that a simple Realized GARCH structure leads to substantial improvements in the empirical t over standard GARCH models.  ...  A new framework is introduced, Realized GARCH, for the joint modeling of returns and realized measures of volatility.  ...  that minimize the MSE (Aït-Sahalia et al., 2005) , which gives n * sparse = T 4E [ 2 ] 2ˆT 0 σ 4 t dt 1/3 .  ... 
doi:10.2139/ssrn.1684338 fatcat:guldnsavdrbsxo3sg3yuxrje6q

Estimation for the Prediction of Point Processes with Many Covariates [article]

Alessio Sancetta
2017 arXiv   pre-print
As an application, the intensity of the buy and sell trades of the New Zealand dollar futures is estimated and a test for forecast evaluation is presented.  ...  Estimation of the primal or dual problem gives the same solution when we are able to map the constraint into the Lagrange multiplier πB. In general, this is not straightforward.  ...  Connection to Lasso Given the constraint on the coefficients b θ 's, minimization over L B is just the primal of an l 1 penalized likelihood estimator, i.e., Lasso.  ... 
arXiv:1702.05315v1 fatcat:nericnrw3zfc5j5r5mjj5nq3uq
« Previous Showing results 1 — 15 out of 24 results