5,841 Hits in 6.4 sec

Monte-Carlo Tree Search as Regularized Policy Optimization [article]

Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos
2020 arXiv   pre-print
The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.  ...  In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem.  ...  classical deep learning (He et al., 2016) and RL (Williams, 1992) techniques with Monte-Carlo tree search (Kocsis and Szepesvári, 2006) .  ... 
arXiv:2007.12509v1 fatcat:z3ftdsaa5jggrgdiwkvny3aebi

Beam Monte-Carlo Tree Search

Hendrik Baier, Mark H. M. Winands
2012 2012 IEEE Conference on Computational Intelligence and Games (CIG)  
This paper presents Beam Monte-Carlo Tree Search (BMCTS), combining the ideas of MCTS and beam search. Like MCTS, BMCTS builds a search tree using Monte-Carlo simulations as state evaluations.  ...  Monte-Carlo Tree Search (MCTS) is a state-of-theart stochastic search algorithm that has successfully been applied to various multi-and one-player games (puzzles).  ...  INTRODUCTION Monte-Carlo Tree Search (MCTS) [1] , [2] is a best-first tree search algorithm with Monte-Carlo evaluation of states.  ... 
doi:10.1109/cig.2012.6374160 dblp:conf/cig/BaierW12 fatcat:26wdyqmdlrbvdo67qfigsub7um

Convex Regularization in Monte-Carlo Tree Search [article]

Tuan Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen
2021 arXiv   pre-print
In this paper, we overcome these limitations by considering convex regularization in Monte-Carlo Tree Search (MCTS), which has been successfully used in RL to efficiently drive exploration.  ...  Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making.  ...  Monte-carlo tree search as regularized policy optimization. arXiv preprint arXiv:2007.12509, 2020.Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S.  ... 
arXiv:2007.00391v3 fatcat:lvzmrgrp2vdxhpuemt2izmysu4

Maximum Entropy Monte-Carlo Planning

Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller
2019 Neural Information Processing Systems  
The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation.  ...  We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).  ...  Monte Carlo Tree Search and UCT To solve the online planning task, Monte Carlo Tree Search (MCTS) builds a look-ahead tree T online in an incremental manner, and evaluates states with Monte Carlo simulations  ... 
dblp:conf/nips/XiaoHMS019 fatcat:373izzevznfcdpshwybuvl7ypa

Three-Head Neural Network Architecture for Monte Carlo Tree Search

Chao Gao, Martin Müller, Ryan Hayward
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf  ...  To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization  ...  PV-MCTS with Delayed Node Expansion In Policy Value Monte Carlo Tree Search (PV-MCTS), typically, each neural net evaluation is computed along with node expansion.  ... 
doi:10.24963/ijcai.2018/523 dblp:conf/ijcai/Gao0H18 fatcat:fgkqqcnru5e5doqqpwi5xbehd4

Preference-Based Monte Carlo Tree Search [chapter]

Tobias Joppen, Christian Wirth, Johannes Fürnkranz
2018 Lecture Notes in Computer Science  
Monte Carlo tree search (MCTS) is a popular choice for solving sequential anytime problems. However, it depends on a numeric feedback signal, which can be difficult to define.  ...  Using a puzzle domain, we show that our preference-based MCTS variant, wich only receives qualitative feedback, is able to reach a performance level comparable to a regular MCTS baseline, which obtains  ...  Preference-Based Monte Carlo Tree Search In this section, we introduce a preference-based variant of Monte Carlo tree search (PB-MCTS), as shown in Fig. 1 .  ... 
doi:10.1007/978-3-030-00111-7_28 fatcat:sobyjl46svexbg5igsqqn6ezea

Goal directed molecule generation using Monte Carlo Tree Search [article]

Anand A. Rajasekar, Karthik Raman, Balaraman Ravindran
2020 arXiv   pre-print
Through this work, we propose a novel method, which we call unitMCTS, to perform molecule generation by making a unit change to the molecule at every step using Monte Carlo Tree Search.  ...  We show that this method outperforms the recently published techniques on benchmark molecular optimization tasks such as QED and penalized logP.  ...  ChemTS [Yang et al., 2017] uses the Monte Carlo Tree Search for SMILES generation with RNN based rollout policy.  ... 
arXiv:2010.16399v2 fatcat:7ivdmlqj7fazngpt55mmcnd7jy

Monte-Carlo Tree Search and minimax hybrids

Hendrik Baier, Mark H. M. Winands
2013 2013 IEEE Conference on Computational Inteligence in Games (CIG)  
Monte-Carlo Tree Search is a sampling-based search algorithm that has been successfully applied to a variety of games.  ...  Monte-Carlo rollouts allow it to take distant consequences of moves into account, giving it a strategic advantage in many domains over traditional depth-limited minimax search with alpha-beta pruning.  ...  Monte-Carlo Tree Search Monte-Carlo Tree Search (MCTS) [1] , [2] is a best-first tree search algorithm using statistical sampling to evaluate states.  ... 
doi:10.1109/cig.2013.6633630 dblp:conf/cig/BaierW13 fatcat:ecs6tlykmvbsrjnfifeiq45vne

Active Reinforcement Learning with Monte-Carlo Tree Search [article]

Sebastian Schulze, Owain Evans
2018 arXiv   pre-print
We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs.  ...  Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs.  ...  BAMCP: MCTS for Bayesian RL BAMCP is a Monte Carlo Tree Search (MCTS) algorithm for Bayesian RL (Guez et al., 2012) .  ... 
arXiv:1803.04926v3 fatcat:u5oaovq2o5g7xdm3on5c4kqa34

Hedging of Financial Derivative Contracts via Monte Carlo Tree Search [article]

Oleg Szehr
2021 arXiv   pre-print
This article introduces Monte Carlo Tree Search as a method to solve the stochastic optimal control problem behind the pricing and hedging tasks.  ...  As a consequence Monte Carlo Tree Search has higher sample efficiency, is less prone to over-fitting to specific market models and generally learns stronger policies faster.  ...  Introduction Monte Carlo Tree Search (MCTS) is an algorithm for approximating optimal decisions in multi-period optimization tasks by taking random samples of actions and constructing a search tree according  ... 
arXiv:2102.06274v3 fatcat:zsqfafgs5jfcrhxiwwtrpmoxwu

Local Search for Policy Iteration in Continuous Control [article]

Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov (+1 others)
2020 arXiv   pre-print
Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces.  ...  We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework.  ...  Monte Carlo Tree Search (MCTS), with RL has recently been shown to be a powerful approach (Silver et al., 2017; Schrittwieser et al., 2019; Anthony et al., 2017) .  ... 
arXiv:2010.05545v1 fatcat:hznjdmmdpjhe3kyb7wxsbtrm6u

Solve Traveling Salesman Problem by Monte Carlo Tree Search and Deep Neural Network [article]

Zhihao Xing, Shikui Tu, Lei Xu
2020 arXiv   pre-print
Second, it uses Monte Carlo tree search to select the best policy by comparing different value functions, which increases its generalization ability.  ...  We present a self-learning approach that combines deep reinforcement learning and Monte Carlo tree search to solve the traveling salesman problem. The proposed approach has two advantages.  ...  That is, r(s, v) = f − g v (8) • P olicy: Based on the value function h of neural network, we use Monte Carlo tree search as default policy to select next action v.  ... 
arXiv:2005.06879v1 fatcat:yl6zsv2urffilbb344dkpdk7m4

Dual Monte Carlo Tree Search [article]

Prashank Kadam, Ruiyang Xu, Karl Lieberherr
2021 arXiv   pre-print
AlphaZero, using a combination of Deep Neural Networks and Monte Carlo Tree Search (MCTS), has successfully trained reinforcement learning agents in a tabula-rasa way.  ...  Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm  ...  Al-phaGo uses a combination of Monte Carlo Tree Search (MCTS) and a Deep Neural Network (DNN).  ... 
arXiv:2103.11517v2 fatcat:qccwe2tiejdezm4d4itd4zrf7y

Evolutionary learning of policies for MCTS simulations

James Pettit, David Helmbold
2012 Proceedings of the International Conference on the Foundations of Digital Games - FDG '12  
Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes.  ...  It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods.  ...  MONTE CARLO TREE SEARCH The basic Monte-Carlo technique for computer Go was proposed by Brügmann [4] .  ... 
doi:10.1145/2282338.2282379 dblp:conf/fdg/PettitH12 fatcat:yatpgfcoqzdltk3dw6yp3qpntu

Improving exploration in policy gradient search: Application to symbolic optimization [article]

Mikel Landajuela Larma, Brenden K. Petersen, Soo K. Kim, Claudio P. Santiago, Ruben Glatt, T. Nathan Mundhenk, Jacob F. Pettit, Daniel M. Faissol
2021 arXiv   pre-print
We present two exploration methods to tackle these issues, building upon ideas of entropy regularization and distribution initialization.  ...  In contrast to traditional evolutionary approaches, using a neural network at the core of the search allows learning higher-level symbolic patterns, providing an informed direction to guide the search.  ...  In other neural-guided search, Li et al. (2019) identify asymptotic constraints of leading polynomial powers and use those constraints to guide Monte Carlo tree search.  ... 
arXiv:2107.09158v1 fatcat:olvhxoepxrc2ziuv4nd526lbka
« Previous Showing results 1 — 15 out of 5,841 results