Filters








168 Hits in 9.2 sec

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [article]

Aurélien Bibaut and Antoine Chambaz and Maria Dimakopoulou and Nathan Kallus and Mark van der Laan
2021 arXiv   pre-print
For regression, we provide fast rates that leverage the strong convexity of squared-error loss.  ...  We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization  ...  The square loss over F × O satisfies the following variance bound: (f, ·) − (f 1 , ·) 2,g * ≤ 4 √ M (R * (f ) − R * (f 1 )) 1/2 ∀f ∈ F, and the following Lispchitz property: | (f, o) − (f , o)| ≤ √ M |  ... 
arXiv:2106.01723v1 fatcat:7srj7d2pczh2nlc4bqfns2i2yi

Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination [article]

Sitan Chen, Frederic Koehler, Ankur Moitra, Morris Yau
2021 arXiv   pre-print
We answer this question in the affirmative for both linear regression and contextual bandits. In fact our algorithms succeed where conventional methods fail.  ...  In this work we revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits, from the perspective of adversarial robustness.  ...  Acknowledgments We thank Ainesh Bakshi and Dylan Foster for useful discussions related to their papers, [BP20] and [FR20] , respectively.  ... 
arXiv:2010.04157v3 fatcat:h2bskgaudvbqhjcer7m3t4wfne

Efficient Learning in Non-Stationary Linear Markov Decision Processes [article]

Ahmed Touati, Pascal Vincent
2021 arXiv   pre-print
For this problem setting, we propose OPT-WLSVI an optimistic model-free algorithm based on weighted least squares value iteration which uses exponential weights to smoothly forget data that are far in  ...  We study episodic reinforcement learning in non-stationary linear (a.k.a. low-rank) Markov Decision Processes (MDPs), i.e, both the reward and transition kernel are linear with respect to a given feature  ...  B Regret Reanalysis of D - LINUCB Russac et al. (2019) propose the D - LINUCB algorithm, based on sequential weighted least squares regression.  ... 
arXiv:2010.12870v3 fatcat:z2vaqjotjbhoznq5bm4hviyfa4

Heteroscedastic Sequences: Beyond Gaussianity

Oren Anava, Shie Mannor
2016 International Conference on Machine Learning  
By applying regret minimization techniques, we devise an efficient online learning algorithm for the problem, without assuming that the error terms comply with a specific distribution.  ...  We address the problem of sequential prediction in the heteroscedastic setting, when both the signal and its variance are assumed to depend on explanatory variables.  ...  Acknowledgments The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement 306638 (SUPREL), and the European  ... 
dblp:conf/icml/AnavaM16 fatcat:ja6fdfpoejhx5jwc3dcfkksc7a

Parallelizing Contextual Linear Bandits [article]

Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. Jordan
2021 arXiv   pre-print
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts – given access to the same total number of oracle queries  ...  We provide matching information-theoretic lower bounds on parallel regret performance to establish our algorithms are asymptotically optimal in the time horizon.  ...  sets for the linear regression estimator in misspecified models.  ... 
arXiv:2105.10590v1 fatcat:gppxezrk7bb4tglhjc3dmoyylq

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [article]

Yining Wang, Yi Chen, Ethan X. Fang, Zhaoran Wang, Runze Li
2020 arXiv   pre-print
We further attain a sharper 𝒪̃(√(sT)) regret by using the SupLinUCB framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems.  ...  This approach achieves 𝒪̃(s√(T)) regret with high probability, which is nearly independent in the "ambient" regression model dimension d.  ...  Sparsity regret bounds for individual sequences in online linear regression. Journal of Machine Learning Research, 14(Mar):729-769, 2013. Y. Goldberg and M. R. Kosorok. Q-learning with censored data.  ... 
arXiv:2009.02003v1 fatcat:xupe6snnzrehjng3vyuyjagpsi

On Submodular Contextual Bandits [article]

Dean P. Foster, Alexander Rakhlin
2021 arXiv   pre-print
Assuming access to an online regression oracle with regret 𝖱𝖾𝗀(ℱ), our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.  ...  On the other hand, using the techniques of (Filmus and Ward 2014), we show that an ϵ-Greedy procedure with local randomization attains regret of O(n^2/3𝖱𝖾𝗀(ℱ)^1/3) against a stronger (1-e^-1) benchmark  ...  bandits corresponds to k = 1, while the top-k problem studied in [SRY+ 21] corresponds to a linear (or, modular) u∗ ; both of these minimize regret for c = 1.  ... 
arXiv:2112.02165v1 fatcat:w5d66ytxrzasrbefoioqyhb7va

Multi-task Representation Learning with Stochastic Linear Bandits [article]

Leonardo Cella, Karim Lounici, Massimiliano Pontil
2022 arXiv   pre-print
We study the problem of transfer-learning in the setting of stochastic linear bandit tasks.  ...  We show the benefit of our strategy compared to the baseline Td√(N) obtained by solving each task independently. We also provide a lower bound to the multi-task regret.  ...  For instance, if we consider the uniform distribution over the sphere we would have κ(Σ) = 1/d.  ... 
arXiv:2202.10066v1 fatcat:y4uevs3k3nau3ii45dern45dna

Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification [article]

James A. Grant, David S. Leslie
2021 arXiv   pre-print
We revisit this problem as a partial monitoring problem with side information, and focus on the case where item features are linked to true classes via a logistic regression model.  ...  Our principal contribution is a study of the performance of Thompson Sampling (TS) for this problem.  ...  Nemeth, Ciara Pike-Burke, and Christopher Sherlock for helpful conversations during the preparation of this manuscript.  ... 
arXiv:2109.14412v1 fatcat:6dh3hq5i5nardnjwd2eafggop4

Neural Replicator Dynamics [article]

Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls
2020 arXiv   pre-print
Additionally, NeuRD has formal equivalence to softmax counterfactual regret minimization, which guarantees convergence in the sequential tabular case.  ...  Using these algorithms in multiagent environments poses problems such as nonstationarity and instability.  ...  for discrete decision problems is a softmax function over the logits y: π t (θ t ) = Π(y(θ t )).  ... 
arXiv:1906.00190v5 fatcat:2ym5g7kb2zfatawsu7xq76l5ga

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning [article]

Hongseok Namkoong, Samuel Daulton, Eytan Bakshy
2020 arXiv   pre-print
Thompson sampling (TS) has emerged as a robust technique for contextual bandit problems.  ...  Since we update the TS policy with observations collected under the imitation policy, our algorithm emulates an off-policy version of TS.  ...  LINEAR-TS uses a exact Bayesian linear regression to model the reward distribution for each action a.  ... 
arXiv:2011.14266v2 fatcat:fspq3k7trffy5lan2p4pevlzyq

ZigZag: A new approach to adaptive online learning [article]

Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
2017 arXiv   pre-print
When the hypothesis class is a set of linear functions bounded in some norm, such a regret bound is achievable if and only if the norm satisfies certain decoupling inequalities for martingales.  ...  We develop a novel family of algorithms for the online learning setting with regret against any data sequence bounded by the empirical Rademacher complexity of that sequence.  ...  The structure of this proof will follow that of Theorem 6, which gives an upper bound on regret in terms of Rad F whenever the one-sided UMD inequality holds.  ... 
arXiv:1704.04010v1 fatcat:f2xffsej4fagrh7l6jjw4mncle

Stochastic Neural Network with Kronecker Flow [article]

Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville
2020 arXiv   pre-print
In all setups, our methods prove to be competitive with existing methods and better than the baselines.  ...  This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters.  ...  Acknowledgement CWH would like to thank Kris Sankaran for pointing to the TIS inequality for Gaussian concentration, which is a key component in deriving the tail bound on Lipschitz flows.  ... 
arXiv:1906.04282v2 fatcat:44e5rr6zarhb3dzddqiycgpyzq

Transitions, Losses, and Re-parameterizations: Elements of Prediction Games [article]

Parameswaran Kamalaruban
2018 arXiv   pre-print
The insights shed some light on the understanding of the intrinsic barriers of the prediction problems and the design of computationally efficient learning algorithms with strong theoretical guarantees  ...  (such as generalizability, statistical consistency, and constant regret etc.).  ...  Thus if we have to use a proper, mixable but non-exp-concave loss function for a sequential prediction (online learning) problem, an O(1) regret bound could be achieved by the following two approaches:  ... 
arXiv:1805.08622v1 fatcat:ymwsxiis2jdyxedgl54y23qb7a

On Kernelized Multi-armed Bandits [article]

Sayak Ray Chowdhury, Aditya Gopalan
2017 arXiv   pre-print
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown.  ...  We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds.  ...  We improve upon it in the sense that the confidence bound in Theorem 2 is simultaneous over all x ∈ D, while the bound has been shown only for a single, fixed x in the Kernel Least-squares setting.  ... 
arXiv:1704.00445v2 fatcat:aq6epbywn5cyfidkrntvonotaa
« Previous Showing results 1 — 15 out of 168 results