Filters








27,291 Hits in 1.9 sec

Count-Based Exploration with the Successor Representation [article]

Marlos C. Machado, Marc G. Bellemare, Michael Bowling
2019 arXiv   pre-print
Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states.  ...  In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each  ...  Acknowledgements The authors would like to thank Jesse Farebrother for the initial implementation of DQN used in this paper, Georg Ostrovski for the discussions and for providing us the exact results we  ... 
arXiv:1807.11622v4 fatcat:uhojfpkbybh4xdc7w5odz5eh3e

Count-Based Exploration with the Successor Representation

Marlos C. Machado, Marc G. Bellemare, Michael Bowling
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states.  ...  In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each  ...  Acknowledgements The authors would like to thank Jesse Farebrother for the initial implementation of DQN used in this paper, Georg Ostrovski for the discussions and for providing us the exact results we  ... 
doi:10.1609/aaai.v34i04.5955 fatcat:g4wk2eacnfdfdnjnewmlisdjvq

Successor Options: An Option Discovery Framework for Reinforcement Learning [article]

Rahul Ramesh, Manan Tomar, Balaraman Ravindran
2019 arXiv   pre-print
In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space.  ...  Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations  ...  Conclusion Successor Options is an option discovery framework that leverages Successor Representations to build options.  ... 
arXiv:1905.05731v1 fatcat:h7zwlqb6v5bprc4bfnijhf5lzy

Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning [article]

Christopher Hoang, Sungryull Sohn, Jongwook Choi, Wilka Carvalho, Honglak Lee
2021 arXiv   pre-print
Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms.  ...  SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space  ...  We also thank Scott Emmons and Ajay Jain for sharing and helping with the code for the SGM [16] baseline.  ... 
arXiv:2111.09858v1 fatcat:opnzhm5bl5h6dc7utpfcoo7dku

Explicit Model Checking of Very Large MDP Using Partitioning and Secondary Storage [chapter]

Arnd Hartmanns, Holger Hermanns
2015 Lecture Notes in Computer Science  
It combines state space exploration based on partitioning with a block-iterative variant of value iteration over the same partitions for the analysis of probabilistic reachability and expected-reward properties  ...  The applicability of model checking is hindered by the state space explosion problem in combination with limited amounts of main memory.  ...  large tree-like MDP, and he used an ad-hoc disk-based implementation to allow verification.  ... 
doi:10.1007/978-3-319-24953-7_10 fatcat:7y3n5ec4h5hypo4snkwlyrwv3a

Active Hierarchical Exploration with Stable Subgoal Representation Learning [article]

Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang
2021 arXiv   pre-print
As the high-level policy selects subgoals in an online learned representation space, the dynamic change of the subgoal space severely hinders effective high-level exploration.  ...  Although GCHRL possesses superior exploration ability by decomposing tasks via subgoals, existing GCHRL methods struggle in temporally extended tasks with sparse external rewards, since the high-level  ...  Furthermore, the counts in frontiers of the latent explored areas would not increase with more exploration, as the online learned representation keeps changing.  ... 
arXiv:2105.14750v2 fatcat:6a2r4ggxs5b5tcjlblzj65us7e

Language, procedures, and the non-perceptual origin of number word meanings

DAVID BARNER
2017 Journal of Child Language  
Years later, they infer the logic of counting from the relations between large number words and their roles in blind counting procedures, only incidentally associating number words with approximate magnitudes  ...  Perceptual representations of objects and approximate magnitudes are often invoked as building blocks that children combine to acquire the positive integers.  ...  One possible source of this insight, which my lab is currently exploring, is children's growing familiarity with the recursive structure of counting.  ... 
doi:10.1017/s0305000917000058 pmid:28376934 fatcat:zzvhb2ljbjaxpdpy7e6dgvippa

APS: Active Pretraining with Successor Features [article]

Hao Liu, Pieter Abbeel
2021 arXiv   pre-print
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via nonparametric entropy maximization, and the explored data can be efficiently leveraged to learn behavior  ...  APS addresses the limitations of existing mutual information maximization based and entropy maximization based unsupervised RL, and combines the best of both worlds.  ...  The work by Badia et al. (2020b) also considers k-nearest neighbor based count bonus to encourage exploration, yielding improved performance on Atari games.  ... 
arXiv:2108.13956v1 fatcat:wmqmziq33falnguf74bvrcgp2m

Deep Successor Reinforcement Learning [article]

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
2016 arXiv   pre-print
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms.  ...  There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map.  ...  However, there is a third class, based on the successor representation (SR), that factors the value function into a predictive representation and a reward function.  ... 
arXiv:1606.02396v1 fatcat:st7mhic7azgebcdr75ahac5kpi

Symblicit Exploration and Elimination for Probabilistic Model Checking [article]

Ernst Moritz Hahn, Arnd Hartmanns
2020 arXiv   pre-print
We then concretise states one-by-one into an explicit partial state space representation.  ...  Experiments show that very large models can be model-checked in this way with very low memory consumption.  ...  Whenever we are done visiting a state s in this second exploration (i.e. in line 13 and below), it has just become fully explored, and the fully-explored-predecessor count of its successors has changed  ... 
arXiv:2001.04289v1 fatcat:vpmhg3ikqfciljayrzceqohxq4

A Tool Architecture for the Next Generation of Uppaal [chapter]

Alexandre David, Gerd Behrmann, Kim G. Larsen, Wang Yi
2003 Lecture Notes in Computer Science  
The architecture is based on essentially one shared data structure to reduce redundant computations in state exploration, which unifies the so-called passed and waiting lists of the traditional reachability  ...  The design is based on a pipeline architecture where each stage represents one independent operation in the verification algorithms.  ...  We would like to thank Johan Bengtsson for discussions and his work in implementing the current version of the Uppaal engine as well as nice scripts to collect statistics.  ... 
doi:10.1007/978-3-540-40007-3_22 fatcat:qhrpe6tz5fcejd5yqcjfx3nyc4

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration [article]

Jingwei Zhang, Niklas Wetzel, Nicolai Dorka, Joschka Boedecker and Wolfram Burgard
2019 arXiv   pre-print
The results show a substantially improved exploration efficiency with SFC and the hierarchical usage of the intrinsic drives.  ...  Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration.  ...  Successor Representation and Successor Features In order to encode long-term statistics into the intrinsic reward design for far-sighted exploration, we build on the formulation of successor represention  ... 
arXiv:1903.07400v2 fatcat:eclmwkgjefe5dep73u2akv7d44

Behavior From the Void: Unsupervised Active Pre-Training [article]

Hao Liu, Pieter Abbeel
2021 arXiv   pre-print
The key novel idea is to explore the environment by maximizing a non-parametric entropy computed in an abstract representation space, which avoids challenging density modeling and consequently allows our  ...  APT learns behaviors and representations by actively searching for novel states in reward-free environments.  ...  Off-policy: the method is compatible with off-policy RL optimization.means only in state-based RL. c(s) is count-based bonus. ψ(s, a): successor feature, φ(s): state representation.  ... 
arXiv:2103.04551v4 fatcat:u5256he4xvgevcmd24hjayeuo4

A New Framework for Query Efficient Active Imitation Learning [article]

Daniel Hsu
2019 arXiv   pre-print
We call this method adversarial reward query with successor representation.  ...  We evaluate the proposed method with simulated human on a state-based 2D navigation task, robotic control tasks and the image-based video games, which have high-dimensional observation and complex state  ...  SUCCESSOR REPRESENTATION The Successor Representation (SR) represents a state in terms of its successors (Dayan, 1993) .  ... 
arXiv:1912.13037v1 fatcat:gnucjz2ofjf7vl43tyurup7yci

A Note on On-the-Fly Verification Algorithms [chapter]

Stefan Schwoon, Javier Esparza
2005 Lecture Notes in Computer Science  
Explicit-state model checkers typically construct the product space "on the fly" and explore the states using depthfirst search.  ...  We survey algorithms proposed for this purpose and propose two improved algorithms, one based on nested DFS, the other on strongly connected components.  ...  In the last section, the whole graph is explored, therefore the only differences are in the transition count, and the size of the explicit state stacks.  ... 
doi:10.1007/978-3-540-31980-1_12 fatcat:cukgelzelzdatkkdefciksylky
« Previous Showing results 1 — 15 out of 27,291 results