Filters








65 Hits in 5.1 sec

Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling

Thomy Phan, Lenz Belzner, Marie Kiermeier, Markus Friedrich, Kyrill Schmid, Claudia Linnhoff-Popien
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling  ...  We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performancememory tradeoff, making it suitable for partially observable planning with highly restricted  ...  Conclusion and Future Work In this paper, we proposed Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size  ... 
doi:10.1609/aaai.v33i01.33017941 fatcat:ovikgxwjzbamjafunmbl4l2mwm

Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling [article]

Thomy Phan, Lenz Belzner, Marie Kiermeier, Markus Friedrich, Kyrill Schmid, Claudia Linnhoff-Popien
2019 arXiv   pre-print
In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling  ...  We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly  ...  Conclusion and Future Work In this paper, we proposed Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size  ... 
arXiv:1905.04020v1 fatcat:fjxh6p3d45hptgmj7ysr5qmpum

Universal Reinforcement Learning Algorithms: Survey and Experiments [article]

John Aslanides, Jan Leike, Marcus Hutter
2017 arXiv   pre-print
The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting.  ...  We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.  ...  Acknowledgements We wish to thank Sean Lamont for his assistance in developing the gridworld visualizations used in Figures 1 and 4 .  ... 
arXiv:1705.10557v1 fatcat:aptsmnq6ajdpvobxerzqlisr3m

A Partially Observable MDP Approach for Sequential Testing for Infectious Diseases such as COVID-19 [article]

Rahul Singh, Fang Liu, Ness B. Shroff
2020 arXiv   pre-print
We investigate fundamental performance bounds, and ensure that our solution is robust to errors in the input graph as well as in the tests themselves.  ...  Countries that have been more successful in corralling the virus typically use a "test, treat, trace, test" strategy that begins with testing individuals with symptoms, traces contacts of positively tested  ...  Open-Loop Policy π 0 : At time t = 0 the user picks T nodes out of N nodes, arranges them in some order and decides to sample them according to this order.  ... 
arXiv:2007.13023v1 fatcat:h2zawjqvlzhu7o2pgarjwqofbe

Proactive Action Preparation: Seeing Action Preparation as a Continuous and Proactive Process

Giovanni Pezzulo, Dimitri Ognibene
2012 Motor Control  
Specifically, we discuss how prior knowledge and prospective abilities can be used to maximize utility even before deciding what to do.  ...  In this paper, we aim to elucidate the processes that occur during action preparation from both a conceptual and a computational point of view.  ...  Alternative to the idea of fast feedback loops is the proposal that motor execution is delegated to open-loop motor primitives (Flash & Hochner, 2005) .  ... 
doi:10.1123/mcj.16.3.386 pmid:22643383 fatcat:5vbiss5fpnhqhdgncrmz3utulq

Convex Optimization: Algorithms and Complexity

Mohammed Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
2015 Foundations and Trends® in Machine Learning  
In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm.  ...  The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and  ...  In the bandit case (single-step planning horizon), this method is in fact equivalent to Thompson sampling.  ... 
doi:10.1561/2200000049 fatcat:xrgut7tqjbf5le7h5otjwcwkry

Sampling-based robotic information gathering algorithms

Geoffrey A. Hollinger, Gaurav S. Sukhatme
2014 The international journal of robotics research  
Our proposed rapidly-exploring information gathering (RIG) algorithms combine ideas from sampling-based motion planning with branch and bound techniques to achieve efficient information gathering in continuous  ...  We propose three sampling-based motion planning algorithms for generating informative mobile robot trajectories.  ...  Sampling-based approaches have been applied to POMDPs in the past (Thrun, 1999) .  ... 
doi:10.1177/0278364914533443 fatcat:nkonq4d6bffytcqqo5ofmy2mua

AIXIjs: A Software Demo for General Reinforcement Learning [article]

John Aslanides
2017 arXiv   pre-print
Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments?  ...  sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015).  ...  Thompson sampling is asymptotically optimal in mean in general environments. 1: t ← 1 2: loop 3: Sample ρ ∼ w (·|ae <t ) 4: d ← H t ( t ) 5: for i = 1 → d do 6: act π ρ 7: end for 8: end loop So much  ... 
arXiv:1705.07615v1 fatcat:hu5axpkgzrcdrijqetf6pmkjua

Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation [article]

Thomy Phan, Lenz Belzner, Thomas Gabor, Kyrill Schmid
2018 arXiv   pre-print
Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions.  ...  In this paper, we propose Emergent Value function Approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to  ...  An approach to open-loop planning in MAS is proposed in [Belzner and Gabor, 2017a ].  ... 
arXiv:1804.06311v1 fatcat:alsd3274mrfgndl3qhbtqnl6ne

Model-based Reinforcement Learning: A Survey [article]

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker
2022 arXiv   pre-print
, and how to integrate planning in the learning and acting loop.  ...  Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence.  ...  Much theoretical work tries to quantify the rate at which algorithms converge, which we can largely split up in sample complexity bounds (PAC bounds) and regret bounds.  ... 
arXiv:2006.16712v4 fatcat:qyb4auoqovdeji4ov65sv6f3fq

Reinforcement Learning in Practice: Opportunities and Challenges [article]

Yuxi Li
2022 arXiv   pre-print
Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn  ...  We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".  ...  agent's memory of the observed space are used in the action selection).  ... 
arXiv:2202.11296v2 fatcat:xdtsmme22rfpfn6rgfotcspnhy

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details.  ...  Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn.  ...  distributions using empirical game-theoretic analysis.  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

Universal Artificial Intelligence [chapter]

Tom Everitt, Marcus Hutter
2018 Foundations of Trusted Autonomy  
Artificial intelligence (AI) bears the promise of making us all healthier, wealthier, and happier by reducing the need for human labour and by vastly increasing our scientific and technological progress  ...  Since the inception of the AI research field in the mid-twentieth century, a range of practical and theoretical approaches have been investigated.  ...  Partially observable MDPs (POMDPs) [35] is another popular approach. However, the learning of POMDPs is still an open question.  ... 
doi:10.1007/978-3-319-64816-3_2 fatcat:pvbspss75bcftktbrhbyjozyom

A Gentle Introduction to Reinforcement Learning and its Application in Different Fields

Muddasar Naeem, S. Tahir H. Rizvi, Antonio Coronato
2020 IEEE Access  
Myopic value of information [17] , policy gradient, POMDP discretization, upper confidence bound, Bayesian sparse sam-pling, BEETLE and Thompson sampling [22] are some of the famous methods that are  ...  not fully covered in used samples.  ... 
doi:10.1109/access.2020.3038605 fatcat:febm7kz525adpcvkfmnim2yha4

Nonparametric General Reinforcement Learning [article]

Jan Leike
2016 arXiv   pre-print
Hence Thompson sampling achieves sublinear regret in these environments.  ...  We construct a large but limit computable class containing a grain of truth and show that agents based on Thompson sampling over this class converge to play Nash equilibria in arbitrary unknown computable  ...  Thompson Sampling In this section we prove that the Thompson sampling policy defined in Section 4.3.4 is asymptotically optimal.  ... 
arXiv:1611.08944v1 fatcat:qaagmvpfbfecdessxa65b7d7n4
« Previous Showing results 1 — 15 out of 65 results