226 Hits in 7.6 sec

Interior-Point Methods for Full-Information and Bandit Online Learning

Jacob D. Abernethy, Elad Hazan, Alexander Rakhlin
2012 IEEE Transactions on Information Theory  
In addition, for the full-information setting, we give a novel regret minimization algorithm.  ...  Our main contribution is the first efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal regret.  ...  CONVEX OPTIMIZATION: SELF-CONCORDANT BARRIERS AND THE DIKIN ELLIPSOID An unconstrained convex optimization problem consists of finding the value that minimizes some given convex objective .  ... 
doi:10.1109/tit.2012.2192096 fatcat:rgn7l2nlsvdhhpgmwuggfdkdhy

Exploiting Smoothness in Statistical Learning, Sequential Prediction, and Stochastic Optimization [article]

Mehrdad Mahdavi
2014 arXiv   pre-print
In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning  ...  The overarching goal of this thesis is to reassess the smoothness of loss functions in statistical learning, sequential prediction/online learning, and stochastic optimization and explicate its consequences  ...  essential to go beyond this barrier to obtain optimal convergence rates in stochastic setting [71, 124] .  ... 
arXiv:1407.5908v1 fatcat:vlevdkb23bfibombrkqttvtlp4

Multi-Agent Learning for Iterative Dominance Elimination: Formal Barriers and New Algorithms [article]

Jibang Wu, Haifeng Xu, Fan Yao
2021 arXiv   pre-print
Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency.  ...  Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to eliminate all dominated  ...  For example, [DDK11, RS13, DISZ17, DP18, MLZ + 18] studied the last-iterate convergence behaviors in zero-sum games of some strongly-uncoupled no-regret algorithms.  ... 
arXiv:2111.05486v1 fatcat:pzavwlz2dzfutnk2o3jznj66ku

Lecture Notes: Optimization for Machine Learning [article]

Elad Hazan
2019 arXiv   pre-print
Lecture notes on optimization for machine learning, derived from a course at Princeton University and tutorials given in MLSS, Buenos Aires, as well as Simons Foundation, Berkeley.  ...  Minimizing Regret The setting we consider for the rest of this chapter is that of online (convex) optimization.  ...  We consider the following question: thus far we have thought of R as a strongly convex function. But which strongly convex function should we choose to minimize regret?  ... 
arXiv:1909.03550v1 fatcat:h4uotrzeunc3dav4epkqgen3ya

Non-stationary Online Learning with Memory and Non-stochastic Control [article]

Peng Zhao and Yu-Xiang Wang and Zhi-Hua Zhou
2021 arXiv   pre-print
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret.  ...  Furthermore, we apply the results to the problem of online non-stochastic control, i.e., controlling a linear dynamical system with adversarial disturbance and convex loss functions.  ...  Acknowledgment The work was partially done while Peng Zhao remotely visited UC Santa Barbara. The authors thank Yu-Hu Yan, Ming Yin, and Dheeraj Baby for helpful discussions.  ... 
arXiv:2102.03758v2 fatcat:ix4fecihsrfoji5xrgv5hi7zoa

A Modern Introduction to Online Learning [article]

Francesco Orabona
2021 arXiv   pre-print
Here, online learning refers to the framework of regret minimization under worst-case assumptions.  ...  I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.  ...  Chapter 4 Beyond √ T Regret Strong Convexity and Online Subgradient Descent Let's now go back to online convex optimization theory.  ... 
arXiv:1912.13213v4 fatcat:ubtsa5jbp5bxdkvn3xdyfqd6ti

Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent [article]

Niangjun Chen, Gautam Goel, Adam Wierman
2018 arXiv   pre-print
We study Smoothed Online Convex Optimization, a version of online convex optimization where the learner incurs a penalty for changing her actions between rounds.  ...  , in particular, OBD is the first algorithm to achieve a dimension-free competitive ratio, $3 + O(1/\alpha)$, for locally polyhedral costs, where $\alpha$ measures the "steepness" of the costs.  ...  Theorem 10 Consider Φ that is an m-strongly convex function in · with ∇Φ(x) * bounded above by G and ∇Φ(0) = 0. Then the L-constrained dynamic regret of Algorithm 3 is ≤ GL η + T η 2m .  ... 
arXiv:1803.10366v2 fatcat:plqeahnygfhcbfhi3yeq6j2yom

Derivative-free optimization methods [article]

Jeffrey Larson, Matt Menickelly, Stefan M. Wild
2019 arXiv   pre-print
problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints.  ...  Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization.  ...  We are especially indebted to Gail Pieper and Glennis Starling for their invaluable editing. This material is based upon work supported  ... 
arXiv:1904.11585v1 fatcat:pvshhbwanvcttigqme3ju32qye

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet [article]

Francis Bach
2016 arXiv   pre-print
This is done for both convex and strongly-convex functions, with finite horizon and anytime algorithms. Finally, we also recover similar results in the online optimization setting.  ...  The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.  ...  Vianney Perchet also acknowledges fundings from the ANR under grant number ANR-13-JS01-0004 and the CNRS under grant project Parasol.  ... 
arXiv:1605.08165v1 fatcat:bxhxu2gq7zff5cpmwg53fw5kx4

Linear Bandits on Uniformly Convex Sets [article]

Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta
2021 arXiv   pre-print
Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, \S  ...  Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$.  ...  Research reported in this paper was partially supported through the Research Campus Modal funded by the German Federal Ministry of Education and Research (fund numbers 05M14ZAM,05M20ZBM) as well as the  ... 
arXiv:2103.05907v1 fatcat:iuolmacen5ddjhcvnnp4bl4c6m

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs [article]

Liad Erez, Tomer Koren
2021 arXiv   pre-print
We develop an algorithm that simultaneously achieves regret bounds of the form: $\smash{\mathcal{O}(\sqrt{\theta(G) T})}$ with adversarial losses; $\mathcal{O}(\theta(G)\operatorname{polylog}{T})$ with  ...  One of our key technical contributions is in establishing the convexity of this regularizer and controlling its inverse Hessian, despite its complex product structure.  ...  Acknowledgements This work has received support from the Israeli Science Foundation (ISF) grant no. 2549/19, from the Len Blavatnik and the Blavatnik Family foundation, and from the Yandex Initiative in  ... 
arXiv:2107.09572v1 fatcat:er5bwbatjzdalbczzqox7l25my

Logarithmic Regret for Adversarial Online Control [article]

Dylan J. Foster, Max Simchowitz
2020 arXiv   pre-print
Existing regret bounds for this setting scale as $\sqrt{T}$ unless strong stochastic assumptions are imposed on the disturbance process.  ...  Our algorithm and analysis use a characterization for the optimal offline control law to reduce the online control problem to (delayed) online learning with approximate advantage functions.  ...  Acknowledgements DF acknowledges the support of TRIPODS award #1740751. MS is generously supported by an Open Philanthropy AI Fellowship. We thank Ali Jadbabaie for helpful discussions.  ... 
arXiv:2003.00189v3 fatcat:lk7asvui7jbrnon7afhgpv5gnm

Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization [article]

Yan Yan and Yi Xu and Qihang Lin and Wei Liu and Tianbao Yang
2020 arXiv   pre-print
Epoch-GD) proposed by Hazan and Kale (2011) was deemed a breakthrough for stochastic strongly convex minimization, which achieves the optimal convergence rate of $O(1/T)$ with $T$ iterative updates for  ...  gap} is achievable for stochastic min-max optimization under strong convexity and strong concavity.  ...  Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization.  ... 
arXiv:2002.05309v2 fatcat:hmh63foxqrh7voxmexezwtth6e

Adapting to Misspecification in Contextual Bandits [article]

Dylan J. Foster and Claudio Gentile and Mehryar Mohri and Julian Zimmert
2021 arXiv   pre-print
Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.  ...  Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $O(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown  ...  Acknowledgements DF acknowledges the support of NSF TRIPODS grant #1740751. We thank Teodor Marinov and Alexander Rakhlin for discussions on related topics.  ... 
arXiv:2107.05745v1 fatcat:aapvoy6xovh4nd5lizacrwr5ai

The Price of Differential Privacy For Online Learning [article]

Naman Agarwal, Karan Singh
2017 arXiv   pre-print
For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of $\tilde{O}\left(\frac{1}{\epsilon}\sqrt{T}\right)$, while the  ...  We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $\tilde{O}(\sqrt{T})$ regret bounds.  ...  (Smith & Thakurta, 2013) proposed a modification of Follow-the-Approximate-Leader template to achieveÕ 1 ε log 2.5 T regret for strongly convex loss functions, implying a regret bound ofÕ 1 ε √ T for  ... 
arXiv:1701.07953v2 fatcat:r2e7r2dilncp3ppf7sr7v5j7cu
« Previous Showing results 1 — 15 out of 226 results