113,913 Hits in 6.6 sec

Active Reinforcement Learning: Observing Rewards at a Cost [article]

David Krueger, Jan Leike, Owain Evans, John Salvatier
2020 arXiv   pre-print
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.  ...  The central question of ARL is how to quantify the long-term value of reward information.  ...  To account for the cost of collecting reward feedback, we consider a simple modification to traditional RL called active reinforcement learning (ARL).  ... 
arXiv:2011.06709v2 fatcat:x5sce4tcavhrjjtqjybapmkeru

Active Reinforcement Learning with Monte-Carlo Tree Search [article]

Sebastian Schulze, Owain Evans
2018 arXiv   pre-print
Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging.  ...  On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL.  ...  We thank David Abel, Michael Osborne and Thomas McGrath for comments on a draft.  ... 
arXiv:1803.04926v3 fatcat:u5oaovq2o5g7xdm3on5c4kqa34

Active Reinforcement Learning over MDPs [article]

Qi Yang, Peng Yang, Ke Tang
2021 arXiv   pre-print
This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection.  ...  The past decade has seen the rapid development of Reinforcement Learning, which acquires impressive performance with numerous training resources.  ...  Active learning is introduced into collecting rewards and deciding whether rewards for observing and saving the feedback cost in these methods.  ... 
arXiv:2108.02323v3 fatcat:cmcn36kiyvffdkprzbdb252tyy

Active Learning for Risk-Sensitive Inverse Reinforcement Learning [article]

Rui Chen, Wenshuo Wang, Zirui Zhao, Ding Zhao
2019 arXiv   pre-print
One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution.  ...  Risk-sensitive inverse reinforcement learning (RS-IRL) bridges such gap by assuming that humans act according to a random cost with respect to a set of subjectively distorted distributions instead of a  ...  INTRODUCTION Inverse reinforcement learning (IRL) provides a novel framework for recovering cost functions utilized in human decision making [1] - [6] .  ... 
arXiv:1909.07843v2 fatcat:s2c4h66pmngzdnk56b4gzkagou

Learning how to Active Learn: A Deep Reinforcement Learning Approach [article]

Meng Fang, Yuan Li, Trevor Cohn
2017 arXiv   pre-print
To address these shortcomings, we introduce a novel formulation by reframing the active learning as a reinforcement learning problem and explicitly learning a data selection policy, where the policy takes  ...  Active learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate.  ...  Based on this, we design an active learning algorithm as a policy based on deep reinforcement learning.  ... 
arXiv:1708.02383v1 fatcat:qzbff36oabf3rp7ny5cxkvszw4

Reinforcement Learning Approach to Active Learning for Image Classification [article]

Thorben Werner
2021 arXiv   pre-print
A newly proposed framework for framing the active learning workflow as a reinforcement learning problem is adapted for image classification and a series of three experiments is conducted.  ...  for image classification with a trained reinforcement learning agent.  ...  The second paper is "Active learning for reward estimation in opposite reinforcement learning" by Lopes et. al.  ... 
arXiv:2108.05595v1 fatcat:oh4dckx42zbzzdev3xtjv7ro64

Active Hierarchical Imitation and Reinforcement Learning [article]

Yaru Niu, Yijun Gu
2020 arXiv   pre-print
Both imitation and reinforcement learning or a combination of them with hierarchical structures have been proven to be an efficient way for robots to learn complex tasks with sparse rewards.  ...  In this project, we explored different imitation learning algorithms and designed active learning algorithms upon the hierarchical imitation and reinforcement learning framework we have developed.  ...  ACKNOWLEDGMENT This is a course project for CS 7633 Human Robot Interaction in Georgia Institute of Technology. We thank Prof.  ... 
arXiv:2012.07330v1 fatcat:bgfe5szbn5bn7idpwfewbqk344

Reinforcement Learning or Active Inference?

Karl J. Friston, Jean Daunizeau, Stefan J. Kiebel, Olaf Sporns
2009 PLoS ONE  
This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility.  ...  This paper questions the need for reinforcement learning or control theory when optimising behaviour.  ...  Dynamic programming and value-learning try to optimise a policy px x ð Þ based on a value-function Vx x ð Þ of hidden states, which corresponds to expected reward or negative cost.  ... 
doi:10.1371/journal.pone.0006421 pmid:19641614 pmcid:PMC2713351 fatcat:a6xr4ehgdrebde6xm272q77aoq

Active MR k-space Sampling with Reinforcement Learning [article]

Luis Pineda, Sumana Basu, Adriana Romero, Roberto Calandra, Michal Drozdzal
2020 arXiv   pre-print
We formulate the problem as a sequential decision process and propose the use of reinforcement learning to solve it.  ...  Experiments on a large scale public MRI dataset of knees show that our proposed models significantly outperform the state-of-the-art in active MRI acquisition, over a large range of acceleration factors  ...  The authors would like to thank the fastMRI team at FAIR and at NYU for engaging discussions.  ... 
arXiv:2007.10469v1 fatcat:mmmtoj6nefhv7fllgne7tplhqm

Reinforcement Learning with Efficient Active Feature Acquisition [article]

Haiyan Yin and Yingzhen Li and Sinno Jialin Pan and Cheng Zhang and Sebastian Tschiatschek
2020 arXiv   pre-print
In this paper, we propose a model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution.  ...  Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states, which are then used by the policy to maximize the task reward  ...  a reinforcement learning policy and an active feature acquisition policy.  ... 
arXiv:2011.00825v1 fatcat:kjhlzuoe2ngplle2qikzgovfe4

The Allocation of Time and Location Information to Activity-Travel Sequence Data by Means of Reinforcement Learning [chapter]

Wets Janssens
2008 Reinforcement Learning  
Acknowledgement Davy Janssens acknowledges support as a post-doctoral research fellow from the Research Foundation -Flanders (F.W.O.-Vlaanderen). References  ...  Translated into a context of Q-learning, the agent learns to find a travel policy that achieves maximal reward/minimal cost.  ...  The agent Reinforcement Learning: Theory and Applications 362 repeatedly observes its current state s, chooses a possible action a to perform, and determines its immediate reward r(s, a) and resulting  ... 
doi:10.5772/5290 fatcat:bhhm2znz6ngefjadw6fijrcn2u

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Finale Doshi-Velez, Joelle Pineau, Nicholas Roy
2012 Artificial Intelligence  
More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.  ...  In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain.  ...  initially unknown, are at least observable during learning.  ... 
doi:10.1016/j.artint.2012.04.006 fatcat:5lqu6casancohdg3xt42tgpbiu

Active Measure Reinforcement Learning for Observation Cost Minimization [article]

Colin Bellinger, Rory Coles, Mark Crowley, Isaac Tamblyn
2020 arXiv   pre-print
Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost.  ...  Critically, by utilizing an active strategy, Amrl-Q achieves a higher costed return.  ...  The results show that Amrl-Q achieves a higher sum of rewards minus observation cost than Q-learning and Dyna-Q, whilst learning at an equivalent rate to Q-learning and Dyna-Q.  ... 
arXiv:2005.12697v1 fatcat:htkkb564j5c45m4svbhupnyyj4

First-Person Activity Forecasting with Online Inverse Reinforcement Learning [article]

Nicholas Rhinehart, Kris M. Kitani
2017 arXiv   pre-print
DARKO learns and forecasts from first-person visual observations of the user's daily behaviors via an Online Inverse Reinforcement Learning (IRL) approach.  ...  Classical IRL discovers only the rewards in a batch setting, whereas DARKO discovers the states, transitions, rewards, and goals of a user from streaming data.  ...  At any time, the user’s future is predicted learning has been applied to inverse reinforcement learn- among this set of goals.  ... 
arXiv:1612.07796v3 fatcat:ftsgybfg6zdzpcdsrlygqtau64

Active Measure Reinforcement Learning for Observation Cost Minimization

Colin Bellinger, Rory Coles, Mark Crowley, Isaac Tamblyn
2021 Proceedings of the Canadian Conference on Artificial Intelligence  
We propose the active measure RL framework (Amrl) as a solution to this novel class of problem, and contrast it with standard reinforcement learning under full observability and planning under partially  ...  Markov Decision Processes (MDP) with explicit measurement cost are a class of environments in which the agent learns to maximize the costed return.  ...  Amrl-Q achieves an equivalent or higher costed reward than the alternative methods. learning an equally good policy at a similar rate, only Amrl-Q can increase the costed reward by actively shifting to  ... 
doi:10.21428/594757db.72846d04 fatcat:mwzs7eaylfgwxl3f5muqe63uja
« Previous Showing results 1 — 15 out of 113,913 results