Filters








1,843 Hits in 5.9 sec

Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time [article]

Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama
2020 arXiv   pre-print
To cope with this problem, we propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time.  ...  The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations.  ...  Then,R n =Õ √ n 1+c ,with high probability. Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time A PREPRINT.  ... 
arXiv:2003.04691v2 fatcat:j7535kork5eldngisshf6wxqfe

CorrAttack: Black-box Adversarial Attack with Structured Search [article]

Zhichao Huang, Yaowei Huang, Tong Zhang
2020 arXiv   pre-print
The time-varying contextual bandits problem can then be solved by a Bayesian optimization procedure, which can take advantage of the features of the structured action space.  ...  We show that searching over the structured space can be approximated by a time-varying contextual bandits problem, where the attacker takes feature of the associated arm to make modifications of the input  ...  It shows that the Gaussian process regression could model the correlation of the reward function and the Bayesian optimization could use it to optimize the time-varying contextual bandits.  ... 
arXiv:2010.01250v1 fatcat:notqcsgqvzdn5eqyb4v54xalwy

Action Centered Contextual Bandits

Kristjan Greenewald, Ambuj Tewari, Predrag Klasnja, Susan Murphy
2017 Advances in Neural Information Processing Systems  
At the same time, it leads to algorithms with strong performance guarantees as in the linear model setting, while still allowing for complex nonlinear baseline modeling.  ...  Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning  ...  We show that action centering is effective in dealing with time-varying and non-linear behavior in our model, leading to regret bounds that scale as nicely as previous bounds for linear contextual bandits  ... 
pmid:29225449 pmcid:PMC5719505 fatcat:pug3qnsfl5arvfnscekbmarzem

Action Centered Contextual Bandits [article]

Kristjan Greenewald and Ambuj Tewari and Predrag Klasnja and Susan Murphy
2017 arXiv   pre-print
At the same time, it leads to algorithms with strong performance guarantees as in the linear model setting, while still allowing for complex nonlinear baseline modeling.  ...  Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning  ...  For the time varying simulation, Gaussian processes were used to generate the reward coefficient sequence η t and the state sequences t .  ... 
arXiv:1711.03596v1 fatcat:6kmdwzq7l5cqbph7qjsr2a77ai

(Sequential) Importance Sampling Bandits [article]

Iñigo Urteaga, Chris H. Wiggins
2019 arXiv   pre-print
We combine SMC both for Thompson sampling and upper confident bound-based (Bayes-UCB) policies, and study different bandit models: classic Bernoulli and Gaussian distributed cases, as well as dynamic and  ...  context dependent linear-Gaussian, logistic and categorical-softmax rewards.  ...  We observe noticeable increases in regret when the dynamics of the parameters swap the optimal arm. This effect is also observed for dynamic bandits with non-Gaussian rewards.  ... 
arXiv:1808.02933v3 fatcat:gdvmt7bujbcsplk2ldyijzk34y

Bayesian Unification of Gradient and Bandit-Based Learning for Accelerated Global Optimisation

Ole-Christoffer Granmo
2016 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)  
However, for continuous optimisation problems or problems with a large number of actions, bandit based approaches can be hindered by slow learning.  ...  In this paper, we propose a Bayesian approach that unifies the above two paradigms in one single framework, with the aim of combining their advantages.  ...  This leads to linear increase in computation time with respect to number of observations, as opposed to the much more computationally expensive Gaussian process based approach (with exact computation involving  ... 
doi:10.1109/icmla.2016.0044 dblp:conf/icmla/Granmo16 fatcat:3ep5f5abnnho7awhdrgcchfjou

Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization [article]

Hung Tran-The and Sunil Gupta and Santu Rana and Svetha Venkatesh
2022 arXiv   pre-print
In particular, whether in the noisy setting, the EI strategy with a standard incumbent converges is still an open question of the Gaussian process bandit optimization problem.  ...  We prove that our algorithm converges, and achieves a cumulative regret bound of 𝒪(γ_T√(T)), where γ_T is the maximum information gain between T observations and the Gaussian process model.  ...  In this paper, we focus on the non-Bayesian setting, i.e. Gaussian process bandit optimization.  ... 
arXiv:2203.07875v1 fatcat:v3gtctmjlfbnzpvzxlo7fbwj5a

Balanced Linear Contextual Bandits

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees.  ...  We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation  ...  Acknowledgments The authors would like to thank Emma Brunskill for valuable comments on the paper and John Langford, Miroslav Dudík, Akshay Krishnamurthy and Chicheng Zhang for useful discussions regarding the evaluation  ... 
doi:10.1609/aaai.v33i01.33013445 fatcat:cvzn2dxls5akzgdweu57qy6co4

Balanced Linear Contextual Bandits [article]

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
2018 arXiv   pre-print
We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees.  ...  We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation  ...  Acknowledgments The authors would like to thank Emma Brunskill for valuable comments on the paper and John Langford, Miroslav Dudík, Akshay Krishnamurthy and Chicheng Zhang for useful discussions regarding the evaluation  ... 
arXiv:1812.06227v1 fatcat:fjvhmzl3kzb3zfpehdz65aipzu

Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget [article]

Baihan Lin, Djallel Bouneffouf
2022 arXiv   pre-print
We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem.  ...  To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward  ...  control and resource stringency, and offer the optimal pareto frontier of the epidemic intervention problem.  ... 
arXiv:2106.15808v2 fatcat:fvxjtkahsfdhnemcb42i6joyiq

Cost-Efficient Online Hyperparameter Optimization [article]

Jingkang Wang, Mengye Ren, Ilija Bogunovic, Yuwen Xiong, Raquel Urtasun
2021 arXiv   pre-print
To decide when to query the validation loss, we model online HPO as a time-varying Bayesian optimization problem, on top of which we propose a novel costly feedback setting to capture the concept of the  ...  Recent work on hyperparameters optimization (HPO) has shown the possibility of training certain hyperparameters together with regular parameters.  ...  Time-varying gaussian process bandit optimization with non- constant evaluation time. CoRR, abs/2003.04691, 2020. Kevin G. Jamieson and Ameet Talwalkar.  ... 
arXiv:2101.06590v1 fatcat:wcxcxojxhjhtfh6qg2yijucxtm

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect [article]

Sulgi Kim, Kyungmin Kim
2020 arXiv   pre-print
Based on this finding, we propose a novel method, "Odds-ratio thompson sampling", which is expected to work robust to time-varying effect.  ...  Multi-armed bandit methods have been used for dynamic experiments particularly in online services.  ...  However, real data can be confounded with the fact that an optimal arm changes over time. In this case, use of plain OR-TS would be also affected.  ... 
arXiv:2003.01905v1 fatcat:naczbbfwurhotcflsgwlygvt5i

Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit

Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, Tao Li
2016 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16  
In this paper, we study the time varying contextual multi-armed problem where the reward mapping function changes over time.  ...  However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time.  ...  TVUCB(λ): it denotes the time varying UCB which integrates our proposed context drift model with UCB bandit algorithm. Similar to LinUCB, the parameter λ is given. 2.  ... 
doi:10.1145/2939672.2939878 dblp:conf/kdd/ZengWML16 fatcat:r2r3c54nbjeqtlbajymtkpnb6y

Bandit-Based Random Mutation Hill-Climbing [article]

Jialin Liu, Diego Peŕez-Lieb́ana, Simon M. Lucas
2016 arXiv   pre-print
The algorithm shows particular promise for discrete optimisation problems where each fitness evaluation is expensive.  ...  It repeats the process of randomly selecting a neighbour of a best-so-far solution and accepts the neighbour if it is better than or equal to it.  ...  There are also some adaptations to other contexts: time varying as in [18] ; adversarial [19] , [20] ); or involving the non-stationary nature of bandit problems in optimization portfolios.  ... 
arXiv:1606.06041v1 fatcat:mku75irl4jduzfckpyehnxhw6e

Taking the Human Out of the Loop: A Review of Bayesian Optimization

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, Nando de Freitas
2016 Proceedings of the IEEE  
The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involves many tunable configuration parameters.  ...  Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years.  ...  This view is used in [122] , where they construct a Gaussian process with non-stationary noise process that starts high when the experiment begins, and decays over time. IX.  ... 
doi:10.1109/jproc.2015.2494218 fatcat:dcdmezhogrd45ippmdaslddlxa
« Previous Showing results 1 — 15 out of 1,843 results