Filters








53 Hits in 1.1 sec

Factored Bandits [article]

Julian Zimmert, Yevgeny Seldin
2018 arXiv   pre-print
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a
more » ... t modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).
arXiv:1807.01488v2 fatcat:h27ktgcjbjfhzhk6qezvyjyiwq

Online Learning for Active Cache Synchronization [article]

Andrey Kolobov, Sébastien Bubeck, Julian Zimmert
2020 arXiv   pre-print
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated. This paper introduces synchronization bandits, a MAB variant where all arms generate costs at all times, but the agent observes an arm's instantaneous cost only when the arm is played. Synchronization MABs are inspired by online caching scenarios such as Web crawling, where an arm corresponds to a cached item and playing
more » ... the arm means downloading its fresh copy from a server. We present MirrorSync, an online learning algorithm for synchronization bandits, establish an adversarial regret of O(T^2/3) for it, and show how to make it practical.
arXiv:2002.12014v2 fatcat:msxdj3lopnbs7f7osbxzy4cz3i

Adapting to Misspecification in Contextual Bandits [article]

Dylan J. Foster and Claudio Gentile and Mehryar Mohri and Julian Zimmert
2021 arXiv   pre-print
Zimmert and Seldin (2019) show this for L = 1, but the extension to general L is trivial.  ...  Zimmert, T. Lattimore, and C. Szepesvari. Model selection in contextual stochastic bandit problems. Neural Information Processing Systems (NeurIPS), 2020. D. Simchi-Levi and Y. Xu.  ... 
arXiv:2107.05745v1 fatcat:aapvoy6xovh4nd5lizacrwr5ai

The Pareto Frontier of model selection for general Contextual Bandits [article]

Teodor V. Marinov, Julian Zimmert
2021 arXiv   pre-print
[Zimmert and Seldin, 2021] . Define Lt = t s=1 lt , Bt = t s=1 b t .  ...  Zimmert and Seldin [2021] ) shows that ∀t : E Mt∼qt [D F * (−( Lt − Bt−1 ), −( Lt−1 − Bt−1 ))] ≤ η √ K.  ... 
arXiv:2110.13282v1 fatcat:ozhsrm2rfna4rfrvhffmj44hb4

A Model Selection Approach for Corruption Robust Reinforcement Learning [article]

Chen-Yu Wei, Christoph Dann, Julian Zimmert
2021 arXiv   pre-print
Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, and Csaba Szepesvari. Model selection in contextual stochastic bandit problems.  ...  Julian Zimmert and Yevgeny Seldin. An optimal algorithm for stochastic and adversarial bandits. In International Conference on Artificial Intelligence and Statistics, 2019.  ... 
arXiv:2110.03580v1 fatcat:us2atl3cybhbtekysz4uiqjmau

Efficient Methods for Online Multiclass Logistic Regression [article]

Naman Agarwal, Satyen Kale, Julian Zimmert
2021 arXiv   pre-print
Zimmert. . This approach requires solving a fixed point equation. . AIOLI would work equally well, since we only require binary regression. .  ... 
arXiv:2110.03020v2 fatcat:cx3pfnihyvc5losso633fi4ude

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays [article]

Julian Zimmert, Yevgeny Seldin
2020 arXiv   pre-print
Julian Zimmert and Yevgeny Seldin. An optimal algorithm for stochastic and adversarial bandits.  ...  Julian Zimmert, Haipeng Luo, and Chen-Yu Wei. Beating stochastic and adversarial semi-bandits optimally and simultaneously.  ... 
arXiv:1910.06054v2 fatcat:cxtg7nbx7neqhadh2u35div2cu

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States [article]

Julian Zimmert, Naman Agarwal, Satyen Kale
2022 arXiv   pre-print
Zimmert, N. Agarwal & S. Kale. . Without loss of generality, we can fill up missing time-steps with rt = 1 d /d, which result in constant losses. .  ... 
arXiv:2202.02765v1 fatcat:iyknpp2p4nds7gcps3rixvzktm

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously [article]

Julian Zimmert, Haipeng Luo, Chen-Yu Wei
2019 arXiv   pre-print
Our algorithm and analysis extend the recent work of (Zimmert Seldin, 2019) for the special case of multi-armed bandit, but importantly requires a novel hybrid regularizer designed specifically for semi-bandit  ...  For the stochastic case we use a self-bounding technique similar to Wei & Luo (2018); Zimmert & Seldin (2019).  ...  Several recent works (Bubeck & Slivkins, 2012; Seldin & Slivkins, 2014; Auer & Chiang, 2016; Seldin & Lugosi, 2017; Wei & Luo, 2018; Zimmert & Seldin, 2019) develop "best-of-both-worlds" results for  ... 
arXiv:1901.08779v2 fatcat:jphnyuzl2ffjjlsejkfub22eki

Model Selection in Contextual Stochastic Bandit Problems [article]

Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
2020 arXiv   pre-print
We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits
more » ... us to obtain O(√(T)) regret guarantees for a wide class of base algorithms when working along with our master. We exhibit a lower bound showing that even when one of the base algorithms has O(log T) regret, in general it is impossible to get better than Ω(√(T)) regret in model selection, even asymptotically. We apply our algorithm to choose among different values of ϵ for the ϵ-greedy algorithm, and to choose between the k-armed UCB and linear UCB algorithms. Our empirical studies further confirm the effectiveness of our model-selection method.
arXiv:2003.01704v2 fatcat:thcovoznzvbgdarylj55q4gxo4

Distributed optimization of multi-class SVMs

Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft, Quan Zou
2017 PLoS ONE  
Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that
more » ... allelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data.
doi:10.1371/journal.pone.0178161 pmid:28570703 pmcid:PMC5453486 fatcat:n77hfxycvfhp3avx7umeyseliu

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning [article]

Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert
2021 arXiv   pre-print
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new
more » ... ion-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.
arXiv:2107.01264v2 fatcat:kqjwc74h35egzn2myfzcnekbri

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits [article]

Julian Zimmert, Yevgeny Seldin
2022 arXiv   pre-print
Acknowledgments We would like to thank Chlo Rouyer for pointing out several bugs in the previous version of the work (Zimmert and Seldin, 2019) and Haipeng Luo for the idea on how to improve our regret  ...  Under the assumption of known time horizon, Zimmert and Lattimore (2019) provide an adversarial regret bound with a leading constant of √ 2.  ...  We analyse the algorithm with standard importance-weighted loss estimators and with reduced-variance loss estimators proposed by Zimmert and Lattimore (2019) .  ... 
arXiv:1807.07623v6 fatcat:s7rsxvfqqbea5ixvspeqd4ulcy

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [article]

Aadirupa Saha, Pierre Gaillard
2022 arXiv   pre-print
Acknowledgment Thanks to Julian Zimmert and Karan Singh for the useful discussions on the existing best-of-both-world multiarmed bandits results.  ...  Moreover, Zimmert and Seldin [2021] also provide an upper-bound for stochastic bandits with adversarial corruption.  ...  We show here that similarly to what happens for standard Multi-armed bandits in Zimmert and Seldin [2021] , this corrupted setting is a special case of the self-bounding assumption (4).  ... 
arXiv:2202.06694v1 fatcat:hd2j4clntzafzhcdjsndsek3gq

Page 72 of Das Kunstwerk Vol. 36, Issue 1 [page]

1983 Das Kunstwerk  
Juliane Roh stellungen für seine Kunst.  ...  tem bastelt, um sie dann auf Drahtgestellen zu bewegen, oder ob sie Pascal Verbena begegnet, der aus Schwemmholz schmal- brüstige Häuser zimmert, hinter deren vie- len Fenstern und Türen sich Überraschen  ... 
« Previous Showing results 1 — 15 out of 53 results