Filters








26 Hits in 7.8 sec

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [article]

Aadirupa Saha, Pierre Gaillard
2022 arXiv   pre-print
In particular, we give the first best-of-both world result for the dueling bandits regret minimization problem – a unified framework that is guaranteed to perform optimally for both stochastic and adversarial  ...  We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions  ...  Acknowledgment Thanks to Julian Zimmert and Karan Singh for the useful discussions on the existing best-of-both-world multiarmed bandits results.  ... 
arXiv:2202.06694v1 fatcat:hd2j4clntzafzhcdjsndsek3gq

Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences

Aadirupa Saha, Pierre Gaillard
2022 International Conference on Machine Learning  
In particular, we give the first best-of-both world result for the dueling bandits regret minimization problem-a unified framework that is guaranteed to perform optimally for both stochastic and adversarial  ...  We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decision  ...  Acknowledgement We thank the anonymous reviewers for their insightful suggestions to improve the paper.  ... 
dblp:conf/icml/SahaG22 fatcat:l4uhliigxvfcxfor4ubomlpb7u

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits [article]

Aadirupa Saha, Shubham Gupta
2022 arXiv   pre-print
This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary 'win-loss' feedback for this pair, sampled from an underlying preference matrix  ...  We study the problem of dynamic regret minimization in K-armed Dueling Bandits under non-stationary or time varying preferences.  ...  Due to the inherent exploration-vs-exploitation tradeoff of the learning framework and several advantages of preference feedback [6, 33] , many real-world applications can be modeled as dueling bandits  ... 
arXiv:2111.03917v2 fatcat:oxbqmmchoverrculjgpc5w47ve

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R [article]

Robin van Emden, Maurits Kaptein
2020 arXiv   pre-print
Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance  ...  contextual and context-free bandit policies through both simulation and offline analysis.  ...  Peter Whittle famously stated "[the problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that  ... 
arXiv:1811.01926v4 fatcat:7im2nngh7jb2zk4vboyicqatra

Reinforcement Learning Approaches in Social Robotics

Neziha Akalin, Amy Loutfi
2021 Sensors  
Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots.  ...  of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the  ...  Schneider and Kummert [44] investigated a dueling bandit learning approach for preference learning. The algorithm draws two or more actions, and the relative preference is used as reward.  ... 
doi:10.3390/s21041292 pmid:33670257 pmcid:PMC7918897 fatcat:gh34qmglt5exjf3gcelmrvoxyu

Reinforcement Learning Approaches in Social Robotics [article]

Neziha Akalin, Amy Loutfi
2021 arXiv   pre-print
Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots.  ...  of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the  ...  Schneider and Kummert [44] investigated a dueling bandit learning approach for preference learning. The algorithm draws two or more actions, and the relative preference is used as reward.  ... 
arXiv:2009.09689v4 fatcat:6vvotvwfhjh5zbsgzmplj5wq7q

Deep Reinforcement Learning, a textbook [article]

Aske Plaat
2022 arXiv   pre-print
The book is written for graduate students of artificial intelligence, and for researchers and practitioners who wish to better understand deep reinforcement learning methods and their challenges.  ...  In some applications they have even become better than the best humans, such as in Atari, Go, poker and StarCraft.  ...  Both approaches aim to improve the speed and accuracy of learning, by learning from a set of subtasks.  ... 
arXiv:2201.02135v2 fatcat:3icsopexerfzxa3eblpu5oal64

First return, then explore [article]

Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune
2021 arXiv   pre-print
, an insight that may prove critical to the creation of truly intelligent learning agents.  ...  The promise of reinforcement learning is to solve complex sequential decision problems autonomously by specifying a high-level reward function only.  ...  Acknowledgements We thank Ashley Edwards, Sanyam Kapoor, Felipe Petroski Such and Jiale Zhi for their ideas, feedback, technical support, and work on aspects of Go-Explore not presented in this work.  ... 
arXiv:2004.12919v4 fatcat:m5in5nokfrgtzdd2gsmuifz7kq

Context & Semantics in News & Web Search

Daan Odijk
2016 SIGIR Forum  
from six episodes of a live daily talk show.  ...  The dataset that was described in Section 6.3 has been made available to the research community; 3 it consists of more than 1,500 manually annotated links in over 5,000 subtitle chunks for 50 video segments  ...  We learn an optimal policy using the Dueling Bandits Gradient Descent (DBGD) algorithm [233] , detailed in Algorithm 4.  ... 
doi:10.1145/2964797.2964816 fatcat:zmt2clavxrdi5izyo73z6kfg3q

Jane Harrison as an Interpreter of Russian Culture in the 1910s-1920s [chapter]

Alexandra Smith
2012 A People Passing Rude: British Responses to Russian Culture  
an unthreatening socialist utopia Right up until the last decade of the twentieth century, Russian cinema of the Soviet period remained in fact terra incognita both for Western researchers and film scholars  ...  This was undoubtedly due to its focus on the lives of ordinary Soviet men and women, which from a newspaper's point of view would be of more interest to ordinary British men and women than Gosplan or pig-iron  ...  composers could learn from.  ... 
doi:10.11647/obp.0022.12 fatcat:xnu5az23ejhgjcruotbnbbu5qy

L'Autre comme Face de la Terreur

Emanuela Ilie
2015 Journal of Humanistic and Social Studies  
The study entitled The Other as the Terror Face describes and analyses the most significant dark forms of the Otherness that appear in the poetry composed in the Romanian communist prisons and work camps  ...  Even though the esthetical value of this particular type of prison creation is sometimes reduced, the reader aims to focus on the existential side of the testimony offered in such sufferance poems.  ...  For Informatics students it is obvious that they would prefer learning English using ICT, rather than pen and paper;  aims of the course: each lecture should have well set aims;  learning outcomes: the  ... 
doaj:7fc9acd1fc8e491ba41c66d6b4d7584f fatcat:4sssl4nsbfcp5pctcose7gwmoe

Russia and Russian Culture in The Criterion, 1922-1939 [chapter]

Olga Ushakova
2012 A People Passing Rude  
composers could learn from.  ...  learning from its westerly neighbours.  ...  and lifestyles of a world largely closed off to them.  ... 
doi:10.2307/j.ctt5vjsk8.20 fatcat:zejd2hmahjcuzbpwcy2tggutfe

An Heuristic Study on Puratchi Thalaivi Dr. Jayaraman Jayalalitha Who had Acted as Heroine with Bharat Ratna Dr. Marudur Gopala Menon Ramachandran in the 28 Classical Tamil Movies, Many of Which are Reflecting Dravidian Ideology – Whether Such an Association Resulted in Developing Leadership Qualitites to become an Unparalled Women Political Leader

P. Sarvaharana, P. Thiyagarajan, S. Manikandan
2021 Global Journal of Human-Social Science  
He also states that "although Bombay is usually considered the capital of the Indian film world, it is within south India that film has made its greatest impact (2).  ...  Question arises whether films and film songs address the issue of social inequality and voice against the sufferings of the lowest rung of the people of Tamil Society?  ...  Rajkumar Lifetime Achievement Award from Karnataka Government. • Karnataka State Film Award for Best Supporting Actress -Namma Makkalu. • Karnataka State Film Award for Best Supporting Actress -Belli Moda  ... 
doi:10.34257/gjhssavol21is5pg67 fatcat:nh5kbtp53fbxpjazxvihyhutdu

Prediction-based search for autonomous game-playing [article]

Alexander Dockhorn, Universitäts- Und Landesbibliothek Sachsen-Anhalt, Martin-Luther Universität, Rudolf Kruse
2020
To play a game without a forward model, methods for learning the environment's model from recent interactions between the agent and the environment are proposed.  ...  An analysis of environment models shows how they can be represented and learned in the form of an end-to-end forward model.  ...  Definition 2.4 (Multi-armed Bandit Problem (adapted from [177])) Let a multi-armed bandit have n levers.  ... 
doi:10.25673/34014 fatcat:irqtgwindvazbka3uf2ryjgjle

Dagstuhl Reports, Volume 7, Issue 5, May 2017, Complete Issue [article]

2018
These bookmarks also improve the predictability of the navigation: we benefit from the predictability in the streaming strategy and therefore can offer a better quality of service to the user.  ...  This talk is based upon work from COST Action CA15140 on Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO), supported by COST (European Cooperation in Science  ...  dueling bandit Hans-Georg Beyer: Towards a Theory of CMA-ES: But first Simplify your problem 9.00 -10.15 CMA-ES Welcome Yohei Akimoto: Optimal Step-size for Weighted Recombination Evolution Introduction  ... 
doi:10.4230/dagrep.7.5 fatcat:ipv6ltpp6ngejao35zzoqko2ry
« Previous Showing results 1 — 15 out of 26 results