697 Hits in 3.9 sec

Inverse Reinforcement Learning in Contextual MDPs [article]

Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy
2020 arXiv   pre-print
We consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs).  ...  Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them  ...  To tackle these problems, we propose the Contextual Inverse Reinforcement Learning (COIRL) framework.  ... 
arXiv:1905.09710v5 fatcat:tluul5ast5dedk4nxsbpevr27a

Inverse reinforcement learning in contextual MDPs

Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy
2021 Machine Learning  
AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs).  ...  Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them  ...  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long  ... 
doi:10.1007/s10994-021-05984-x fatcat:6ep2uiwwsfgq7orqqs3xdmtbs4

Guest editorial: special issue on reinforcement learning for real life

Yuxi Li, Alborz Geramifard, Lihong Li, Csaba Szepesvari, Tao Wang
2021 Machine Learning  
In the article titled "Inverse Reinforcement Learning in Contextual MDPs", the authors Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, and Tom Zahavy formulate the contextual inverse RL  ...  Reinforcement learning (RL) is a general paradigm for learning, predicting, and decision making, with broad applications in sciences, engineering and arts.  ...  In the article titled "Dealing with Multiple Experts and Non-Stationarity in Inverse Reinforcement Learning: An Application to Real-Life Problems", the authors Amarildo Likmeta, Alberto Maria Metelli,  ... 
doi:10.1007/s10994-021-06041-3 fatcat:ew3uhfhhevdd5khjeq2umhby7a

CARL: Aggregated Search with Context-Aware Module Embedding Learning [article]

Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang, Hai-Tao Zheng
2019 arXiv   pre-print
the joint learning process.  ...  The context-aware module embeddings together with the ranking policy are jointly optimized under the Markov decision process (MDP) formulation.  ...  Reinforcement learning is also used in many recommendation tasks.  ... 
arXiv:1908.03141v1 fatcat:ixagzaxvhfdqvbesluhyulo45y

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains [article]

James Bannon, Brad Windsor, Wenbo Song, Tao Li
2020 arXiv   pre-print
Reinforcement learning algorithms have had tremendous successes in online learning settings.  ...  These settings require developing reinforcement learning algorithms that can operate in the so-called batch setting, where the algorithms must learn from set of data that is fixed, finite, and generated  ...  By decomposing the whole trajectory into state-action-reward tuples, off-policy evaluation in reinforcement learning can be viewed as estimation in multiple contextual bandits problems, where s t is the  ... 
arXiv:2006.02579v1 fatcat:mtkn7pjh5zdzrcgvgniau7qff4

Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration [article]

Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush Varshney, Murray Campbell, Moninder Singh, Francesca Rossi
2018 arXiv   pre-print
Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a  ...  We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment  ...  The first is the inverse reinforcement learning component to learn the desirable constraints (depicted in green in Figure 2 ).  ... 
arXiv:1809.08343v1 fatcat:du372cqvifeubbfpwgyityq44a

Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs [article]

Andrea Zanette, Emma Brunskill
2019 arXiv   pre-print
In particular, it is found that a very minor variant of a recently proposed reinforcement learning algorithm for MDPs already matches the best possible regret bound Õ (√(SAT)) in the dominant term if deployed  ...  In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs).  ...  reinforcement learning that inherit the best performance of the setting they operate in, whether it is a bandit, contextual bandit, MDP or POMDP?  ... 
arXiv:1911.00954v1 fatcat:o5lse5po5vgnpkt6oozlxwt2kq

Generative Inverse Deep Reinforcement Learning for Online Recommendation [article]

Xiaocong Chen and Lina Yao and Aixin Sun and Xianzhi Wang and Xiwei Xu and Liming Zhu
2020 arXiv   pre-print
Deep reinforcement learning enables an agent to capture user's interest through interactions with the environment dynamically. It has attracted great interest in the recommendation research.  ...  To address the above issue, we propose a novel generative inverse reinforcement learning approach, namely InvRec, which extracts the reward function from user's behaviors automatically, for online recommendation  ...  Reinforcement Learning-based Recommendation Reinforcement Learning based recommendation systems learn from interactions through an Markov Decision Process (MDP).  ... 
arXiv:2011.02248v1 fatcat:g7swi666d5bkbjhw6busphp5ey

Inverse Reinforcement Learning with Multiple Ranked Experts [article]

Pablo Samuel Castro, Shijian Li, Daqing Zhang
2019 arXiv   pre-print
We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert.  ...  We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance  ...  During a recent visit to Scott Niekum's group in UT Austin I decided to dust it off, as they're doing research on inverse reinforcement learning.  ... 
arXiv:1907.13411v1 fatcat:2vmwsd7r3vfrdcclpxyjwzjcg4

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning [article]

Akshay Mete, Rahul Singh, Xi Liu, P. R. Kumar
2021 arXiv   pre-print
Motivated by this, we examine the finite-time performance of RBMLE for reinforcement learning tasks that involve the general problem of optimal control of unknown Markov Decision Processes.  ...  The RBMLE approach has been proved to be long-term average reward optimal in a variety of contexts.  ...  reinforcement learning.  ... 
arXiv:2011.07738v3 fatcat:d4sk4bcohzaq7pyjnkdegm2jqi

Navigate like a cabbie

Brian D. Ziebart, Andrew L. Maas, Anind K. Dey, J. Andrew Bagnell
2008 Proceedings of the 10th international conference on Ubiquitous computing - UbiComp '08  
The model generalizes to unseen situations and scales to incorporate rich contextual information.  ...  We train our model using the route preferences of 25 taxi drivers demonstrated in over 100,000 miles of collected data, and demonstrate the performance of our model by inferring: (1) decision at next intersection  ...  ACKNOWLEDGMENTS The authors thank Eric Oatneal and Jerry Campolongo of Yellow Cab Pittsburgh for their assistance, Ellie Lin Ratliff for helping to conduct the study of driving habits, and John Krumm for his help in  ... 
doi:10.1145/1409635.1409678 dblp:conf/huc/ZiebartMDB08 fatcat:gm3q2lhmwfhn3mmwdp6fo3f3gq

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies [article]

Soroush Nasiriany, Vitchyr H. Pong, Ashvin Nair, Alexander Khazatsky, Glen Berseth, Sergey Levine
2021 arXiv   pre-print
Can we use reinforcement learning to learn general-purpose policies that can perform a wide range of different tasks, resulting in flexible and reusable skills?  ...  We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn these policies.  ...  DISTRIBUTION-CONDITIONED REINFORCEMENT LEARNING In this section, we show how conditioning policies on a goal distribution results in a contextual MDP that can capture any set of reward functions.  ... 
arXiv:2104.11707v1 fatcat:jzkzgzdyhjccnmoidr2wwd2qd4

MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces [article]

Marlesson R. O. Santana, Luckeciano C. Melo, Fernando H. F. Camargo, Bruno Brandão, Anderson Soares, Renan M. Oliveira, Sandor Caetano
2020 arXiv   pre-print
For this matter, we propose MARS-Gym, an open-source framework to empower researchers and engineers to quickly build and evaluate Reinforcement Learning agents for recommendations in marketplaces.  ...  In this context, we observed a lack of resources to design, train, and evaluate agents that learn by interacting within these environments.  ...  In order to take advantage of those scenarios and effectively learn from them, we decided to use Reinforcement Learning.  ... 
arXiv:2010.07035v1 fatcat:pbukzume5zg47lfw5mi34a3qz4

Reinforcement Learning for Uplift Modeling [article]

Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, Junwu Xiong
2019 arXiv   pre-print
In this work, we address the problem from a new angle and reformulate it as a Markov Decision Process (MDP).  ...  In Section 3, we present our deep reinforcement learning design for uplift modeling.  ...  Reinforcement Learning Method For Uplift Modeling Overview In this section, we first show how to reformulate the uplift modeling problem as an MDP problem by constructing an equivalent Markov chain for  ... 
arXiv:1811.10158v2 fatcat:qodobxdg2zdgbd6op3x4rrjsaq

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [article]

Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell
2021 arXiv   pre-print
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection.  ...  We study finite-sample properties of LCB as well as information-theoretic limits in multi-armed bandits, contextual bandits, and Markov decision processes (MDPs).  ...  the offline learning problem in contextual bandits.  ... 
arXiv:2103.12021v1 fatcat:7wbhgdjr65gx7lme7gmf35txum
« Previous Showing results 1 — 15 out of 697 results