Filters








5 Hits in 1.4 sec

BRPO: Batch Residual Policy Optimization [article]

Sungryull Sohn and Yinlam Chow and Jayden Ooi and Ofir Nachum and Honglak Lee and Ed Chi and Craig Boutilier
2020 arXiv   pre-print
To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.  ...  We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.  ...  Concluding Remarks We have presented Batch Residual Policy Optimization (BRPO) for learning residual policies in batch RL settings.  ... 
arXiv:2002.05522v2 fatcat:i42dsyw2ubgddicfl2a2lsbraq

BRPO: Batch Residual Policy Optimization

Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent.  ...  We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance.  ...  Wx) hence very hard to be generalized to more general residual networks.  ... 
doi:10.24963/ijcai.2020/387 dblp:conf/ijcai/WangSZZ20 fatcat:2cdi3iz6azdope4hnjfjolhbxi

DACE: Distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization

Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Hiroki Arimura
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Then, we propose a mixed-integer linear optimization approach to extracting an optimal action by minimizing our cost function.  ...  Concluding Remarks We have presented Batch Residual Policy Optimization (BRPO) for learning residual policies in batch RL settings.  ...  In this work, we study the problem of residual policy optimization (RPO) in the batch setting.  ... 
doi:10.24963/ijcai.2020/391 dblp:conf/ijcai/SohnCONLCB20 fatcat:avhn7sqcwfdbtnmk6wy7tfhzga

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts [article]

Gilwoo Lee, Brian Hou, Sanjiban Choudhury, Siddhartha S. Srinivasa
2020 arXiv   pre-print
Our algorithm, Bayesian Residual Policy Optimization (BRPO), imports the scalability of policy gradient methods and task-specific expert skills.  ...  Next, we train a Bayesian residual policy to improve upon the ensemble's recommendation and learn to reduce uncertainty.  ...  BRPO performs batch policy optimization in the residual belief MDP, producing actions that continuously correct the ensemble recommendations.  ... 
arXiv:2002.03042v1 fatcat:o25waigf2rasho4fowfv6sdtny

Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement Learning [article]

Lionel Blondé, Alexandros Kalousis, Stéphane Marchand-Maillet
2022 arXiv   pre-print
Our investigations confirm that careless injections of such optimality inductive biases make dominant agents subpar as soon as the offline policy is sub-optimal.  ...  The performance of state-of-the-art offline RL methods varies widely over the spectrum of dataset qualities, ranging from far-from-optimal random data to close-to-optimal expert demonstrations.  ...  We align the notion of optimality with Bellman's principle of optimality. As such, a policy is optimal if and only if its value is solution to the optimal version of Bellman's equation.  ... 
arXiv:2107.01407v2 fatcat:jqkjoakns5eqhnndww54t3ay2q