A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Board Representations for Neural Go Players Learning by Temporal Difference
2007
2007 IEEE Symposium on Computational Intelligence and Games
We compare three different board representations for self-learning ANNs on a 5×5 board employing temporal difference learning (TDL) with two types of move selection (during training). ...
The majority of work on artificial neural networks (ANNs) playing the game of Go focus on network architectures and training regimes to improve the quality of the neural player. ...
SUMMARY AND CONCLUSIONS We have presented self-learning experiments of neural Go players based on temporal difference learning (TDL) on a 5×5 board investigating three different board representations and ...
doi:10.1109/cig.2007.368096
dblp:conf/cig/Mayer07
fatcat:xw3xkhnxwffq7fvqrw7a2xfemq
EXPERIMENTS WITH LEARNING OPENING STRATEGY IN THE GAME OF GO
2004
International journal on artificial intelligence tools
using temporal difference learning. ...
We present an experimental methodology and results for a machine learning approach to learning opening strategy in the game of Go, a game for which the best computer programs play only at the level of ...
Acknowledgements This work was supported by the National Science Foundation under Grant No. 9876181, and by Middlebury College. ...
doi:10.1142/s0218213004001430
fatcat:fhcj4kfq4fbvblgrkdhxduqgpa
Abalearn: A Risk-Sensitive Approach to Self-play Learning in Abalone
[chapter]
2003
Lecture Notes in Computer Science
Our approach is based on a reinforcement learning algorithm that is riskseeking, since defensive players in Abalone tend to never end a game. ...
We evaluate our approach using a fixed heuristic opponent as a benchmark, pitting our agents against human players online and comparing samples of our agents at different times of training. ...
Dahl [5] proposes an hybrid approach for Go: a neural network is trained to imitate local game shapes made by an expert database via supervised learning. ...
doi:10.1007/978-3-540-39857-8_6
fatcat:hib5wsl3vrgz5jmispttzxdx3a
Evolving small-board Go players using coevolutionary temporal difference learning with archives
2011
International Journal of Applied Mathematics and Computer Science
Evolving small-board Go players using coevolutionary temporal difference learning with archives We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented ...
Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between ...
Acknowledgment This work has been supported by the Polish Ministry of Science and Higher Education under the grant no. N N519 441939. ...
doi:10.2478/v10006-011-0057-3
fatcat:uv6tkgqbbfaulblu3jv5da2yp4
Coevolutionary Temporal Difference Learning for small-board Go
2010
IEEE Congress on Evolutionary Computation
game of Go on small boards (5 × 5). ...
In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the ...
ACKNOWLEDGMENTS This work was supported in part by Ministry of Science and Higher Education grant # N N519 3505 33. ...
doi:10.1109/cec.2010.5586054
dblp:conf/cec/KrawiecS10
fatcat:qi65gddgungrxbtbxu2nte3bj4
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
[article]
2017
arXiv
pre-print
In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. ...
The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several ...
It was trained by temporal-difference learning to predict the final game outcome, and also the expected features after two moves. ...
arXiv:1712.01815v1
fatcat:flj56adezzf6xepdezevbo24xq
Neuroevolution in Games: State of the Art and Open Challenges
[article]
2015
arXiv
pre-print
We analyse the application of NE in games along five different axes, which are the role NE is chosen to play in a game, the different types of neural networks used, the way these networks are evolved, ...
In neuroevolution, artificial neural networks are trained through evolutionary algorithms, taking inspiration from the way biological brains evolved. ...
Game strategies could be learned by algorithms from the temporal difference learning family, player models could be learned with support vector machines, game content could be represented as constraint ...
arXiv:1410.7326v3
fatcat:yqynswodpnbgzdf52mlisix3hu
Modified cellular simultaneous recurrent networks with cellular particle swarm optimization
2012
The 2012 International Joint Conference on Neural Networks (IJCNN)
Computer Go serves as an excellent test bed for CSRNs because of its clear-cut objective. For the training data, we developed an accurate theoretical foundation and game tree for the 2x2 game board. ...
The conventional CSRN architecture suffers from the multi-valued function problem; our modified CSRN architecture overcomes the problem by employing ternary coding of the Go board's representation and ...
We thank Duksoo Lim, a 5 dan Go expert certified by the Korean Baduk Association, for his help with a comprehensive theoretical study of a 2x2 Go research platform. ...
doi:10.1109/ijcnn.2012.6252845
dblp:conf/ijcnn/KimW12
fatcat:n3tfhfiamnfpjm6jhtepvl24tu
Coevolution Versus Self-Play Temporal Difference Learning for Acquiring Position Evaluation in Small-Board Go
2005
IEEE Transactions on Evolutionary Computation
Two learning methods for acquiring position evaluation for small Go boards are studied and compared. ...
The methods studied are temporal difference learning using the self-play gradient-descent method and co-evolutionary learning, using an evolution strategy. ...
Acknowledgements The authors thank the anonymous reviewers, and David Fogel, Yngvi Björnsson, and Bruno Bouzy, for their helpful and insightful comments on earlier versions of this paper. ...
doi:10.1109/tevc.2005.856212
fatcat:ooljqi56sfc3jpt7i5nmrh7qxi
Neuroevolution in Games: State of the Art and Open Challenges
2017
IEEE Transactions on Computational Intelligence and AI in Games
We analyse the application of NE in games along five different axes, which are the role NE is chosen to play in a game, the different types of neural networks used, the way these networks are evolved, ...
In neuroevolution, artificial neural networks are trained through evolutionary algorithms, taking inspiration from the way biological brains evolved. ...
Game strategies could be learned by algorithms from the temporal difference learning family, player models could be learned with support vector machines, game content could be represented as constraint ...
doi:10.1109/tciaig.2015.2494596
fatcat:uenp54gg2vffdolr5awox2ayx4
Indirect Encoding of Neural Networks for Scalable Go
[chapter]
2010
Parallel Problem Solving from Nature, PPSN XI
A key feature of Go is that humans begin to learn on a small board, and then incrementally learn advanced strategies on larger boards. ...
While some machine learning methods can also scale the board, they generally only focus on a subset of the board at one time. ...
One promising such approach is machine learning, wherein techniques such as temporal difference learning or neuroevolution learn a value function from an abstract representation [2] [3] [4] . ...
doi:10.1007/978-3-642-15844-5_36
dblp:conf/ppsn/GauciS10
fatcat:5hqylncw7fhyngu2s26feeiguu
Mastering the game of Go without human knowledge
2017
Nature
This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = f θ (s). ...
These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from selfplay. ...
Cain for work on the visuals; A. Barreto, G. Ostrovski, T. Ewalds, T. Schaul, J. Oh and N. Heess for reviewing the paper; and the rest of the DeepMind team for their support. ...
doi:10.1038/nature24270
pmid:29052630
fatcat:h2n334a2ejfxtknx67kbiaswfq
Hybrid of Evolution and Reinforcement Learning for Othello Players
2007
2007 IEEE Symposium on Computational Intelligence and Games
In this paper, the evolutionary algorithm is boosted using resources from the reinforcement learning. 1) The initialization of initial population using solution optimized by temporal difference learning ...
Although the reinforcement learning and evolutionary algorithm show good results in board evaluation optimization, the hybrid of both approaches is rarely addressed in the literature. ...
Lucas et al. compared two learning methods for acquiring position evaluation for small Go boards [13] . ...
doi:10.1109/cig.2007.368099
dblp:conf/cig/KimCC07
fatcat:h6ga7cxuindh3didxtd34osccq
Coevolutionary Temporal Difference Learning for Othello
2009
2009 IEEE Symposium on Computational Intelligence and Games
We apply CTDL to the board game of Othello, using weighted piece counter for representing players' strategies. ...
The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. ...
ACKNOWLEDGMENTS This work was supported in part by Ministry of Science and Higher Education grant # N N519 3505 33 and grant POIG.01.01.02-00-014/08-00. ...
doi:10.1109/cig.2009.5286486
dblp:conf/cig/SzubertJK09
fatcat:2byzeqgxzbb33ju7fhbekwlli4
Learning to Evaluate Go Positions via Temporal Difference Methods
[chapter]
2001
Studies in Fuzziness and Soft Computing
We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning. ...
Our approach is based on neural network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent ...
Acknowledgements We are grateful to Patrice Simard and Gerry Tesauro for helpful discussions, to Tim Casey for game records from the Internet Go Server, and to Geoff Hinton for CPU cycles. ...
doi:10.1007/978-3-7908-1833-8_4
fatcat:tadip2detvh6ni4vgrje4gvqti
« Previous
Showing results 1 — 15 out of 3,872 results