2,871 Hits in 9.2 sec

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees [article]

Thomas Anthony and Robert Nishihara and Philipp Moritz and Tim Salimans and John Schulman
2019 arXiv   pre-print
We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search  ...  Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play.  ...  Policy Gradient Search Expert Iteration One motivation of this work is the value of online planning algorithms during training of RL agents.  ... 
arXiv:1904.03646v1 fatcat:h6o26ba4ybgbvf5rmg225llutq

The factored policy-gradient planner

Olivier Buffet, Douglas Aberdeen
2009 Artificial Intelligence  
Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent.  ...  This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both.  ...  This project was also funded by the Australian Defence Science and Technology Organisation. Thank you to Sylvie Thiébaux and Iain Little for many helpful insights.  ... 
doi:10.1016/j.artint.2008.11.008 fatcat:r5igklyeb5gklbqyvqza5ujj5a

Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration [article]

Peixi Peng, Junliang Xing
2021 arXiv   pre-print
by the demonstration and updated by the learned decentralized policy to improve the sub-optimality.  ...  On the other hand, the Nash Equilibrium are found by the current state-action value and are used as a guide to learn the policy.  ...  The former two methods use the demonstration policy to guide game tree search.  ... 
arXiv:1812.01825v2 fatcat:phdg7eaofjdy7k452gfk465uii

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search [article]

Dixant Mittal and Siddharth Aravindan and Wee Sun Lee
2022 arXiv   pre-print
A tree-based online search algorithm iteratively simulates trajectories and updates Q-value information on a set of states represented by a tree structure.  ...  We conduct experiments on complex planning problems, which include Sokoban and Hamiltonian cycle search in sparse graphs and show that combining exploration with policy gradient improves online search  ...  Policy Gradient Search Policy Gradient Search (PGS) is an online search algorithm that iteratively improves a parameterised simulation policy using policy gradient methods.  ... 
arXiv:2202.01461v2 fatcat:7vd6g4hcgrd6viuxt6qlywy4dy

Thinking Fast and Slow with Deep Learning and Tree Search [article]

Thomas Anthony, Zheng Tian, David Barber
2017 arXiv   pre-print
Planning new policies is performed by tree search, while a deep neural network generalises those plans.  ...  Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans.  ...  Acknowledgements This work was supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1 and by AWS Cloud Credits for Research.  ... 
arXiv:1705.08439v4 fatcat:zxjxdg522neonlwyey2t2jdt4y

On Learning Intrinsic Rewards for Policy Gradient Methods [article]

Zeyu Zheng, Junhyuk Oh, Satinder Singh
2018 arXiv   pre-print
Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents.  ...  In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.  ...  This work was supported by NSF grant IIS-1526059, by a grant from Toyota Research Institute (TRI), and by a grant from DARPA's L2M program.  ... 
arXiv:1804.06459v2 fatcat:vtwr4g4xvjbo7pm3fpyn5hxclq

On Monte Carlo Tree Search and Reinforcement Learning

Tom Vodopivec, Spyridon Samothrakis, Branko Ster
2017 The Journal of Artificial Intelligence Research  
Our study promotes a unified view of learning, planning, and search.  ...  We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm  ...  Acknowledgements The authors wish to thank the anonymous reviewers and associate editor for their help in improving this manuscript -their advice helped clarify and enforce the key message of this study  ... 
doi:10.1613/jair.5507 fatcat:igffnyo5hfbyzigzxpp6t6pebi

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making [article]

Luchen Li and Matthieu Komorowski and Aldo A. Faisal
2018 arXiv   pre-print
To address the challenge of infinite number of possible belief states which renders exact value iteration intractable, we evaluate and plan for only every encountered belief, through heuristic search tree  ...  Both actor and critic parameters are learned via gradient-based approaches.  ...  In this section, we introduce how our algorithm combines tree search with policy gradient/function approximations to realise both continuous action space and efficient online planning for POMDP.  ... 
arXiv:1805.11548v3 fatcat:tbcfmjdjevhajopkearkj6btzu

Monte Carlo Tree Search: A Review of Recent Modifications and Applications [article]

Maciej Świechowski, Konrad Godlewski, Bartosz Sawicki, Jacek Mańdziuk
2022 arXiv   pre-print
The method relies on intelligent tree search that balances exploration and exploitation.  ...  Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems.  ...  Such a policy is optimized by a reinforcement learning (RL) policy through self-play and the policy gradient optimization method.  ... 
arXiv:2103.04931v3 fatcat:xvkkhhk2czhhxmcbueagtccnam

Continuous Control Monte Carlo Tree Search Informed by Multiple Experts

Joose Julius Rajamaki, Perttu Hamalainen
2018 IEEE Transactions on Visualization and Computer Graphics  
The tree search utilizes information from multiple sources including two machine learning models.  ...  We present a sampling-based model-predictive controller that comes in the form of a Monte Carlo tree search (MCTS).  ...  This work was supported by the Academy of Finland (grants 299358 and 305737). Joose Rajamäki additionally thanks Tekniikan edistämissäätiö for their support.  ... 
doi:10.1109/tvcg.2018.2849386 pmid:29994613 fatcat:q3z2gwrpjvbcvetxqjk4gpugwy

Temporal-difference search in computer Go

David Silver, Richard S. Sutton, Martin Müller
2012 Machine Learning  
Without any explicit search tree, our approach outperformed an unenhanced Monte-Carlo tree search with the same number of simulations.  ...  Like Monte-Carlo tree search, the value function is updated from simulated experience; but like temporal-difference learning, it uses value function approximation and bootstrapping to efficiently generalise  ...  Without any explicit search tree, TD search achieved better performance than an unenhanced Monte-Carlo tree search.  ... 
doi:10.1007/s10994-012-5280-0 fatcat:mee7ubiorvhslofqjmsdh5ixeu

Planning and Learning Using Adaptive Entropy Tree Search [article]

Piotr Kozakowski, Mikołaj Pacek, Piotr Miłoś
2021 arXiv   pre-print
We present the Adaptive Entropy Tree Search (ANTS) algorithm, a planning method based on the Principle of Maximum Entropy.  ...  Importantly, we design ANTS so that it is a practical component of a planning-learning loop, outperforming state-of-the-art methods on the Atari benchmark.  ...  We present a novel algorithm, Adaptive Entropy Tree Search.  ... 
arXiv:2102.06808v2 fatcat:vxq2p4qnovao7pelzng2jtsexe

Dynamic Search – Optimizing the Game of Information Seeking [article]

Zhiwen Tang, Grace Hui Yang
2021 arXiv   pre-print
Details are given for how different approaches are used to model interactions among the human user, the search system, and the environment.  ...  To position dynamic search in a larger research landscape, the article discusses in detail its relationship to related research topics and disciplines.  ...  ACKNOWLEDGMENTS The authors would like to thank Ian Soboroff, Jiyun Luo, Shiqi Liu, Angela Yang, and Xuchu Dong for their past efforts during our long-term collaboration on dynamic search.  ... 
arXiv:1909.12425v2 fatcat:mdby4xq4jrg4pm6jpxslsv2sri

Modeling Strong and Human-Like Gameplay with KL-Regularized Search [article]

Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown
2022 arXiv   pre-print
Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g.  ...  We show in chess and Go that regularizing search based on the KL divergence from an imitation-learned policy results in higher human prediction accuracy and stronger performance than imitation learning  ...  of policy gradient and imitation loss directly on samples of population data.  ... 
arXiv:2112.07544v2 fatcat:2nofvs2gmfg4fpwh4ow22j5d4m

Learning to search: Functional gradient techniques for imitation learning

Nathan D. Ratliff, David Silver, J. Andrew Bagnell
2009 Autonomous Robots  
These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert  ...  We derive and discuss the framework both mathematically and intuitively, and demonstrate practical realworld performance with three applied case-studies including legged locomotion, grasp planning, and  ...  and vehicle teams.  ... 
doi:10.1007/s10514-009-9121-3 fatcat:sljcvigzcjaexpr6xo3b32yax4
« Previous Showing results 1 — 15 out of 2,871 results