Filters








3,872 Hits in 10.7 sec

Board Representations for Neural Go Players Learning by Temporal Difference

Helmut A. Mayer
2007 2007 IEEE Symposium on Computational Intelligence and Games  
We compare three different board representations for self-learning ANNs on a 5×5 board employing temporal difference learning (TDL) with two types of move selection (during training).  ...  The majority of work on artificial neural networks (ANNs) playing the game of Go focus on network architectures and training regimes to improve the quality of the neural player.  ...  SUMMARY AND CONCLUSIONS We have presented self-learning experiments of neural Go players based on temporal difference learning (TDL) on a 5×5 board investigating three different board representations and  ... 
doi:10.1109/cig.2007.368096 dblp:conf/cig/Mayer07 fatcat:xw3xkhnxwffq7fvqrw7a2xfemq

EXPERIMENTS WITH LEARNING OPENING STRATEGY IN THE GAME OF GO

TIMOTHY HUANG, GRAEME CONNELL, BRYAN McQUADE
2004 International journal on artificial intelligence tools  
using temporal difference learning.  ...  We present an experimental methodology and results for a machine learning approach to learning opening strategy in the game of Go, a game for which the best computer programs play only at the level of  ...  Acknowledgements This work was supported by the National Science Foundation under Grant No. 9876181, and by Middlebury College.  ... 
doi:10.1142/s0218213004001430 fatcat:fhcj4kfq4fbvblgrkdhxduqgpa

Abalearn: A Risk-Sensitive Approach to Self-play Learning in Abalone [chapter]

Pedro Campos, Thibault Langlois
2003 Lecture Notes in Computer Science  
Our approach is based on a reinforcement learning algorithm that is riskseeking, since defensive players in Abalone tend to never end a game.  ...  We evaluate our approach using a fixed heuristic opponent as a benchmark, pitting our agents against human players online and comparing samples of our agents at different times of training.  ...  Dahl [5] proposes an hybrid approach for Go: a neural network is trained to imitate local game shapes made by an expert database via supervised learning.  ... 
doi:10.1007/978-3-540-39857-8_6 fatcat:hib5wsl3vrgz5jmispttzxdx3a

Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krzysztof Krawiec, Wojciech Jaśkowski, Marcin Szubert
2011 International Journal of Applied Mathematics and Computer Science  
Evolving small-board Go players using coevolutionary temporal difference learning with archives We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented  ...  Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between  ...  Acknowledgment This work has been supported by the Polish Ministry of Science and Higher Education under the grant no. N N519 441939.  ... 
doi:10.2478/v10006-011-0057-3 fatcat:uv6tkgqbbfaulblu3jv5da2yp4

Coevolutionary Temporal Difference Learning for small-board Go

Krzysztof Krawiec, Marcin Szubert
2010 IEEE Congress on Evolutionary Computation  
game of Go on small boards (5 × 5).  ...  In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the  ...  ACKNOWLEDGMENTS This work was supported in part by Ministry of Science and Higher Education grant # N N519 3505 33.  ... 
doi:10.1109/cec.2010.5586054 dblp:conf/cec/KrawiecS10 fatcat:qi65gddgungrxbtbxu2nte3bj4

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm [article]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
2017 arXiv   pre-print
In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play.  ...  The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several  ...  It was trained by temporal-difference learning to predict the final game outcome, and also the expected features after two moves.  ... 
arXiv:1712.01815v1 fatcat:flj56adezzf6xepdezevbo24xq

Neuroevolution in Games: State of the Art and Open Challenges [article]

Sebastian Risi, Julian Togelius
2015 arXiv   pre-print
We analyse the application of NE in games along five different axes, which are the role NE is chosen to play in a game, the different types of neural networks used, the way these networks are evolved,  ...  In neuroevolution, artificial neural networks are trained through evolutionary algorithms, taking inspiration from the way biological brains evolved.  ...  Game strategies could be learned by algorithms from the temporal difference learning family, player models could be learned with support vector machines, game content could be represented as constraint  ... 
arXiv:1410.7326v3 fatcat:yqynswodpnbgzdf52mlisix3hu

Modified cellular simultaneous recurrent networks with cellular particle swarm optimization

Tae-Hyung Kim, Donald C. Wunsch
2012 The 2012 International Joint Conference on Neural Networks (IJCNN)  
Computer Go serves as an excellent test bed for CSRNs because of its clear-cut objective. For the training data, we developed an accurate theoretical foundation and game tree for the 2x2 game board.  ...  The conventional CSRN architecture suffers from the multi-valued function problem; our modified CSRN architecture overcomes the problem by employing ternary coding of the Go board's representation and  ...  We thank Duksoo Lim, a 5 dan Go expert certified by the Korean Baduk Association, for his help with a comprehensive theoretical study of a 2x2 Go research platform.  ... 
doi:10.1109/ijcnn.2012.6252845 dblp:conf/ijcnn/KimW12 fatcat:n3tfhfiamnfpjm6jhtepvl24tu

Coevolution Versus Self-Play Temporal Difference Learning for Acquiring Position Evaluation in Small-Board Go

T.P. Runarsson, S.M. Lucas
2005 IEEE Transactions on Evolutionary Computation  
Two learning methods for acquiring position evaluation for small Go boards are studied and compared.  ...  The methods studied are temporal difference learning using the self-play gradient-descent method and co-evolutionary learning, using an evolution strategy.  ...  Acknowledgements The authors thank the anonymous reviewers, and David Fogel, Yngvi Björnsson, and Bruno Bouzy, for their helpful and insightful comments on earlier versions of this paper.  ... 
doi:10.1109/tevc.2005.856212 fatcat:ooljqi56sfc3jpt7i5nmrh7qxi

Neuroevolution in Games: State of the Art and Open Challenges

Sebastian Risi, Julian Togelius
2017 IEEE Transactions on Computational Intelligence and AI in Games  
We analyse the application of NE in games along five different axes, which are the role NE is chosen to play in a game, the different types of neural networks used, the way these networks are evolved,  ...  In neuroevolution, artificial neural networks are trained through evolutionary algorithms, taking inspiration from the way biological brains evolved.  ...  Game strategies could be learned by algorithms from the temporal difference learning family, player models could be learned with support vector machines, game content could be represented as constraint  ... 
doi:10.1109/tciaig.2015.2494596 fatcat:uenp54gg2vffdolr5awox2ayx4

Indirect Encoding of Neural Networks for Scalable Go [chapter]

Jason Gauci, Kenneth O. Stanley
2010 Parallel Problem Solving from Nature, PPSN XI  
A key feature of Go is that humans begin to learn on a small board, and then incrementally learn advanced strategies on larger boards.  ...  While some machine learning methods can also scale the board, they generally only focus on a subset of the board at one time.  ...  One promising such approach is machine learning, wherein techniques such as temporal difference learning or neuroevolution learn a value function from an abstract representation [2] [3] [4] .  ... 
doi:10.1007/978-3-642-15844-5_36 dblp:conf/ppsn/GauciS10 fatcat:5hqylncw7fhyngu2s26feeiguu

Mastering the game of Go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap (+5 others)
2017 Nature  
This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = f θ (s).  ...  These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from selfplay.  ...  Cain for work on the visuals; A. Barreto, G. Ostrovski, T. Ewalds, T. Schaul, J. Oh and N. Heess for reviewing the paper; and the rest of the DeepMind team for their support.  ... 
doi:10.1038/nature24270 pmid:29052630 fatcat:h2n334a2ejfxtknx67kbiaswfq

Hybrid of Evolution and Reinforcement Learning for Othello Players

Kyung-Joong Kim, Heejin Choi, Sung-Bae Cho
2007 2007 IEEE Symposium on Computational Intelligence and Games  
In this paper, the evolutionary algorithm is boosted using resources from the reinforcement learning. 1) The initialization of initial population using solution optimized by temporal difference learning  ...  Although the reinforcement learning and evolutionary algorithm show good results in board evaluation optimization, the hybrid of both approaches is rarely addressed in the literature.  ...  Lucas et al. compared two learning methods for acquiring position evaluation for small Go boards [13] .  ... 
doi:10.1109/cig.2007.368099 dblp:conf/cig/KimCC07 fatcat:h6ga7cxuindh3didxtd34osccq

Coevolutionary Temporal Difference Learning for Othello

Marcin Szubert, Wojciech Jaskowski, Krzysztof Krawiec
2009 2009 IEEE Symposium on Computational Intelligence and Games  
We apply CTDL to the board game of Othello, using weighted piece counter for representing players' strategies.  ...  The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search.  ...  ACKNOWLEDGMENTS This work was supported in part by Ministry of Science and Higher Education grant # N N519 3505 33 and grant POIG.01.01.02-00-014/08-00.  ... 
doi:10.1109/cig.2009.5286486 dblp:conf/cig/SzubertJK09 fatcat:2byzeqgxzbb33ju7fhbekwlli4

Learning to Evaluate Go Positions via Temporal Difference Methods [chapter]

N. N. Schraudolph, P. Dayan, T. J. Sejnowski
2001 Studies in Fuzziness and Soft Computing  
We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning.  ...  Our approach is based on neural network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent  ...  Acknowledgements We are grateful to Patrice Simard and Gerry Tesauro for helpful discussions, to Tim Casey for game records from the Internet Go Server, and to Geoff Hinton for CPU cycles.  ... 
doi:10.1007/978-3-7908-1833-8_4 fatcat:tadip2detvh6ni4vgrje4gvqti
« Previous Showing results 1 — 15 out of 3,872 results