Filters








50,886 Hits in 5.6 sec

Fuzzy Theory Based Single Belief State Generation for Partially Observable Real-time Strategy Games

Weilong Yang, Xu Xie, Yong Peng
2019 IEEE Access  
Therefore, this paper proposes a fuzzy theory-based single belief state generation method named FTH to do what based on multi-layer information sets extracted from the history position information.  ...  As the basic problem of the real-time strategy (RTS) games, AI planning has attracted wide attention of researchers, but it still remains as a huge challenge due to its large searching space and realtime  ...  ACKNOWLEDGMENT We want to thank Quanjun Yin and Qi Zhang for the discussion about partially observable environment and optimization algorithm design. And thank Yanqing Ye for the draft editing.  ... 
doi:10.1109/access.2019.2923419 fatcat:gknjfm2lwbfivlpoldg3fgdaiy

Multiple Tree for Partially Observable Monte-Carlo Tree Search [article]

David Auger
2011 arXiv   pre-print
We propose an algorithm for computing approximate Nash equilibria of partially observable games using Monte-Carlo tree search based on recent bandit methods.  ...  We obtain experimental results for the game of phantom tic-tac-toe, showing that strong strategies can be efficiently computed by our algorithm.  ...  on the real state of the game.  ... 
arXiv:1102.1580v1 fatcat:j37pquo465dwhcgcnotuvi7ugm

Multiple Tree for Partially Observable Monte-Carlo Tree Search [chapter]

David Auger
2011 Lecture Notes in Computer Science  
We propose an algorithm for computing approximate Nash equilibria of partially observable games using Monte-Carlo tree search based on recent bandit methods.  ...  We obtain experimental results for the game of phantom tic-tac-toe, showing that strong strategies can be efficiently computed by our algorithm.  ...  on the real state of the game.  ... 
doi:10.1007/978-3-642-20525-5_6 fatcat:g2pqzhnnwbd5pny42wwoxb3sta

On Improving Deep Reinforcement Learning for POMDPs [article]

Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao
2018 arXiv   pre-print
The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs  ...  We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.  ...  We have demonstrated the effectiveness of our proposed approach in several POMDP problems in comparison to the state-of-the-art approaches.  ... 
arXiv:1704.07978v6 fatcat:hjar7q5p6bd6vfu7pf75h5p5gu

Solving Partially Observable Stochastic Games with Public Observations

Karel Horák, Branislav Bošanský
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Partially observable stochastic games (POSGs) are among the most general formal models that capture such dynamic scenarios.  ...  We propose such a subclass for two-player zero-sum games with discounted-sum objective function—POSGs with public observations (POPOSGs)—where each player is able to reconstruct beliefs of the other player  ...  Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.  ... 
doi:10.1609/aaai.v33i01.33012029 fatcat:v5divnvzyvaoxf5mgp2d5xx5gy

Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks

Prashant J. Doshi
2012 The AI Magazine  
I put the two frameworks, decentralized partially observable Markov decision process (Dec-POMDP) and the interactive partially observable Markov decision process (I-POMDP), in context and review the foundational  ...  algorithms for these frameworks, while briefly discussing the advances in their specializations.  ...  This is analogous to iterated elimination of very weakly dominated behavioral strategies -a well-known technique for compacting games -in the context of partially observable stochastic games.  ... 
doi:10.1609/aimag.v33i4.2402 fatcat:peqlr3rr5bghffao6zjowl7amy

On Improving Deep Reinforcement Learning for POMDPs [article]

Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao
2018 arXiv   pre-print
The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs  ...  We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.  ...  We have demonstrated the effectiveness of our proposed approach in several POMDP problems in comparison to the state-of-the-art approaches.  ... 
arXiv:1804.06309v2 fatcat:edqab5pgmvgwfjkd3qwevs5drm

Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games [article]

Macheng Shen, Jonathan P. How
2020 arXiv   pre-print
This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information  ...  We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents.  ...  We summarize the key findings of this work as follow: • We propose algorithms based on MARL and ensemble training for robust opponent modeling and posterior inference over the opponent type from the observed  ... 
arXiv:1909.08735v4 fatcat:ne2qkof3lvf7rdjww4xhtdsjhi

Deceptive Kernel Function on Observations of Discrete POMDP [article]

Zhili Zhang, Quanyan Zhu
2020 arXiv   pre-print
Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and  ...  This paper studies the deception applied on agent in a partially observable Markov decision process.  ...  Within the field of cybersecurity, deception as a general strategy has been discussed frequently in the game-theoretic framework from the defender perspective.  ... 
arXiv:2008.05585v1 fatcat:uyl3ryghuva5dfox3gfp2j7r6a

A Model-Based, Decision-Theoretic Perspective on Automated Cyber Response [article]

Lashon B. Booker, Scott A. Musman
2020 arXiv   pre-print
We combine a simulation of the system to be defended with an anytime online planner to solve cyber defense problems characterized as partially observable Markov decision problems (POMDPs).  ...  Cyber-attacks can occur at machine speeds that are far too fast for human-in-the-loop (or sometimes on-the-loop) decision making to be a viable option.  ...  One way to account for these issues is to address the cyber response problem directly as a partially observable stochastic game (e.g. as a partially observable competitive Markov decision process (Zonouz  ... 
arXiv:2002.08957v1 fatcat:rakhpiufdve4rorxi5o45rhlqy

Resilience of LTE eNode B against smart jammer in infinite-horizon asymmetric repeated zero-sum game

Farhan M. Aziz, Lichun Li, Jeff S. Shamma, Gordon L. Stüber
2020 Physical Communication  
Smart jammer (informed player) uses its evolving belief state as the fixed-sized sufficient statistic for the repeated game.  ...  Hence, the problem is convexified by devising suboptimal security strategies with guaranteed performance for both players that are based on approximated optimal game value.  ...  In a more general setting, informed player may decide to reveal no information, partial information, or complete information based on its payoff model to exploit the situation for its own benefit.  ... 
doi:10.1016/j.phycom.2019.100989 fatcat:7morvf5dobbfzmxfv4e6vrgava

Improving Policies via Search in Cooperative Partially Observable Games [article]

Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown
2019 arXiv   pre-print
However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well.  ...  In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game.  ...  Acknowledgments We would like to thank Pratik Ringshia for developing user interfaces used to interact with Hanabi agents.  ... 
arXiv:1912.02318v1 fatcat:3g56lo36bbaofa2zoggd3xan2e

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Shin Ishii, Hajime Fujita, Masaoki Mitsutake, Tatsuya Yamazaki, Jun Matsuda, Yoichiro Matsuno
2005 Machine Learning  
The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system.  ...  We formulate an automatic strategy acquisition problem for the multi-agent card game "Hearts" as a reinforcement learning problem.  ...  This study was partly supported by Grant-in-Aid for Scientific Research (B) (No. 16014214) from Japan Society for the Promotion of Science.  ... 
doi:10.1007/s10994-005-0461-8 fatcat:rp7oo5rj3nf55exvfqjnt5nb5u

Improving Policies via Search in Cooperative Partially Observable Games

Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well.  ...  In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game.  ...  We use τ t = {s 0 , a 0 , r 0 , ...s t } to denote the game history (or 'trajectory') at time t.  ... 
doi:10.1609/aaai.v34i05.6208 fatcat:2uyfzliewzgzpccpazwcmtzywi

Approximating n-player behavioural strategy nash equilibria using coevolution

Spyridon Samothrakis, Simon Lucas
2011 Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO '11  
In this paper we propose a coevolutionary algorithm that approximates behavioural strategy Nash equilibria in n-player zero sum games, by exploiting the minimax solution concept.  ...  In order to support our case we provide a set of experiments in both games of known and unknown equilibria.  ...  Partially Observable Markov Decision Process A Partially Observable MDP (POMDP) [15] is described by a tuple P = S, A, T, f, O, N, b0 .  ... 
doi:10.1145/2001576.2001726 dblp:conf/gecco/SamothrakisL11 fatcat:hcw7pblre5esjaaayzrjcwv7ta
« Previous Showing results 1 — 15 out of 50,886 results