Filters








1,442 Hits in 12.4 sec

Weighted Last-Step Min-Max Algorithm with Improved Sub-Logarithmic Regret [article]

Edward Moroshko, Koby Crammer
2013 arXiv   pre-print
Forster proposed a last-step min-max algorithm which was somewhat simpler than the algorithm of Vovk, yet with the same regret.  ...  We fix this problem by weighing the examples in such a way that the min-max problem will be well defined, and provide analysis with logarithmic regret that may have better multiplicative factor than both  ...  A Last Step Min-Max Algorithm Our algorithm is derived based on a last-step min-max prediction, proposed by Forster [12] and Takimoto and Warmuth [24] .  ... 
arXiv:1301.6058v1 fatcat:4ymksgug35dbjlqmk63piqiqsi

Weighted Last-Step Min-Max Algorithm with Improved Sub-logarithmic Regret [chapter]

Edward Moroshko, Koby Crammer
2012 Lecture Notes in Computer Science  
Forster [4] proposed a last-step min-max algorithm which was simpler than the algorithm of Vovk [12], yet with the same regret.  ...  We fix this problem by weighing the examples in such a way that the min-max problem will be well defined, and provide analysis with logarithmic regret that may have better multiplicative factor than both  ...  A Last Step Min-Max Algorithm Our algorithm is derived based on a last-step min-max prediction, proposed by Forster [4] and also Takimoto and Warmuth [10] .  ... 
doi:10.1007/978-3-642-34106-9_21 fatcat:ce5alyaiknbqfnsvqqgnj73hra

Q-learning with Logarithmic Regret [article]

Kunhe Yang, Lin F. Yang, Simon S. Du
2021 arXiv   pre-print
planning horizon, T is the total number of steps, and Δ_min is the minimum sub-optimality gap.  ...  This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly  ...  It would be interesting to apply this idea to improve the H dependence in our logarithmic regret bound.  ... 
arXiv:2006.09118v2 fatcat:ycvexbzb6vfrtpyh3rbx75joki

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Nicolas Galichet, Michèle Sebag, Olivier Teytaud
2013 Asian Conference on Machine Learning  
arm with maximal minimal value.  ...  As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB.  ...  Then, with probability at least 1 − δ, the cumulative regret is upper bounded as follows: R t ≤ K − 1 A ∆ µ,max ∆ a,min log tK δ + (K − 1)∆ µ,max (6) with ∆ a,min = min i ∆ a,i and ∆ µ,max = max i ∆ µ,  ... 
dblp:conf/acml/GalichetST13 fatcat:q6y4sjaw2jhgbfsjg3oqy24zfu

Efficient Adaptive Regret Minimization [article]

Zhou Lu, Elad Hazan
2022 arXiv   pre-print
In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.  ...  This motivates the stronger metric of adaptive regret, or the maximum regret over any continuous sub-interval in time.  ...  In section 3, we present our algorithm and show a simplified analysis that leads to an Õ(|I| 3 4 ) adaptive regret bound with doubly-logarithmic number of experts.  ... 
arXiv:2207.00646v3 fatcat:cqvfhgms2fgnxdvrucf2iiqkre

Gap-Dependent Bounds for Two-Player Markov Games [article]

Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S.Du
2021 arXiv   pre-print
Furthermore, we extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound.  ...  In this paper, we analyze the cumulative regret when conducting Nash Q-learning algorithm on 2-player turn-based stochastic Markov games (2-TBSG), and propose the very first gap dependent logarithmic upper  ...  First Step: Split the Total Regret into the Expected Sum of Gaps In the first step, we split the total regret defined above into several single-step sub-optimality gaps. Lemma 1.  ... 
arXiv:2107.00685v1 fatcat:ezpjuvigjvarxpnpm3awnkt4nq

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits [article]

Nicolas Galichet, Michèle Sebag, Olivier Teytaud (LRI, INRIA Saclay - Ile de France)
2014 arXiv   pre-print
arm with maximal minimal value.  ...  As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB.  ...  Then, with probability at least 1 − δ, the cumulative regret is upper bounded as follows: R t ≤ K − 1 A ∆ µ,max ∆ a,min log tK δ + (K − 1)∆ µ,max (6) with ∆ a,min = min i ∆ a,i and ∆ µ,max = max i ∆ µ,  ... 
arXiv:1401.1123v1 fatcat:zq7nsfw6pzcoffcdcrwk7zyaey

Multiplayer Multi-armed Bandits for Optimal Assignment in Heterogeneous Networks [article]

Harshvardhan Tibrewal, Sravan Patchala, Manjesh K. Hanawal, Sumit J. Darak
2019 arXiv   pre-print
Building on this, we develop an algorithm that gives logarithmic regret, even when the number of users changes with time.  ...  For the wideband sensing and narrowband sensing scenarios, we first develop explore-and-commit algorithms that converge to near-optimal allocation with high probability in a small number of rounds.  ...  Using this in the last step of (8), we get Regret from the exploration phases is bounded as follows: R E 2 (T) ≤ 3NK∆ max l 0 l=l e l e − (l) 2 T s (l)/8N 2 ≤ 3NK∆ max l 0 l e l e − (l) 2 l 1+β = 3NK∆  ... 
arXiv:1901.03868v4 fatcat:dypdgx7sozhefp2bmbydn2noca

(Locally) Differentially Private Combinatorial Semi-Bandits [article]

Xiaoyu Chen, Kai Zheng, Zixin Zhou, Yunchang Yang, Wei Chen, Liwei Wang
2020 arXiv   pre-print
For B_1-bounded smooth CSB under ε-DP, we also prove the optimal regret bound is Θ̃(mKB^2_1ln T/Δϵ) with both upper bound and lower bound, where K is the maximum number of feedback in each round.  ...  and m is the number of base arms, by proposing novel algorithms and matching lower bounds.  ...  non-private CUCB that achieves O i∈[m],∆ i min >0 B 2 ∞ ln T ∆ i min regret An Improved Algorithm with the Best Guarantee Compared with the previous studies that try to eliminate the side-effect of  ... 
arXiv:2006.00706v2 fatcat:g6efvtbujrf7bfst5wrqrlaswa

An Online Learning Approach to Improving the Quality of Crowd-Sourcing

Yang Liu, Mingyan Liu
2015 Performance Evaluation Review  
data sample or task, with the results aggregated using for example simple or weighted majority voting rule.  ...  We design an efficient online algorithm LS_OL using a simple majority voting rule that can differentiate high-and low-quality labelers over time, and is shown to have a regret (w.r.t. always using the  ...  First note that the regret is nearly logarithmic in T and therefore it has zero average regret as T → ∞; such an algorithm is often referred to as a zero-regret algorithm.  ... 
doi:10.1145/2796314.2745874 fatcat:f5ismmfqundfbccgez6hivjz2y

An Online Learning Approach to Improving the Quality of Crowd-Sourcing

Yang Liu, Mingyan Liu
2017 IEEE/ACM Transactions on Networking  
data sample or task, with the results aggregated using for example simple or weighted majority voting rule.  ...  We design an efficient online algorithm LS_OL using a simple majority voting rule that can differentiate high-and low-quality labelers over time, and is shown to have a regret (w.r.t. always using the  ...  First note that the regret is nearly logarithmic in T and therefore it has zero average regret as T → ∞; such an algorithm is often referred to as a zero-regret algorithm.  ... 
doi:10.1109/tnet.2017.2680245 fatcat:v6szr43hwbe4xbf5eujpzl2eti

No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling [article]

Eric Neyman, Tim Roughgarden
2022 arXiv   pre-print
Our main result is an algorithm based on online mirror descent that learns expert weights in a way that attains O(√(T)log T) expected regret as compared with the best weights in hindsight.  ...  For each of T time steps, m experts report probability distributions over n outcomes; we wish to learn to aggregate these forecasts in a way that attains a no-regret guarantee.  ...  If η t = η for all t, the regret of Algorithm 1 is at most 1 η max w∈∆ m R(w) − min w∈∆ m R(w) + T t=1 ∇L t (w t ) • (w t − w t+1 ).  ... 
arXiv:2202.11219v1 fatcat:ksyunvgpxvbcpbzkhjgjzdic6u

Second-Order Non-Stationary Online Learning for Regression [article]

Nina Vaits, Edward Moroshko, Koby Crammer
2013 arXiv   pre-print
Our first algorithm performs adaptive resets to forget the history, while the second is last-step min-max optimal in context of a drift.  ...  In addition, in the stationary case, when no drift occurs, our algorithms suffer logarithmic regret, as for previous algorithms.  ...  sub-logarithmic regret in the stationary case.  ... 
arXiv:1303.0140v1 fatcat:6vmvxlpq3narxaavbndragrlie

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions [article]

Jiafan He and Dongruo Zhou and Tong Zhang and Quanquan Gu
2022 arXiv   pre-print
We show that for both known C and unknown C cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.  ...  Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases (C=0) simultaneously.  ...  For each round k ∈ [K] and any action x ∈ D k , the sub-optimality gap ∆ x,k is defined as ∆ x,k = max x * ∈D k θ * , x * − θ * , x , and the minimal sub-optimality gap is defined as ∆ = min k∈[K],x∈D  ... 
arXiv:2205.06811v2 fatcat:f7rulxvojbbrtksjt5kfitn6ka

Online Linear Optimization with Many Hints [article]

Aditya Bhaskara and Ashok Cutkosky and Ravi Kumar and Manish Purohit
2020 arXiv   pre-print
In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the K hints that has positive correlation with the cost vectors.  ...  To accomplish this, we develop a way to combine many arbitrary OLO algorithms to obtain regret only a logarithmically worse factor than the minimum regret of the original algorithms in hindsight; this  ...  However, while there are exactly K sub-phases in any phase of Algorithm 2 (except perhaps the last one), the number of sub-phases in any phase of Algorithm 4 is a random variable.  ... 
arXiv:2010.03082v1 fatcat:x7onr7tennfo7pu3dxkbhjwdha
« Previous Showing results 1 — 15 out of 1,442 results