54,094 Hits in 3.4 sec

Universal Successor Features Approximators [article]

Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul
2018 arXiv   pre-print
Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI  ...  Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators  ...  Thus, we define universal successor features as ψ(s, a, π) ≡ ψ π (s, a). Based on such definition, we callψ(s, a, π) ≈ ψ(s, a, π) a universal successor features approximator (USFA).  ... 
arXiv:1812.07626v1 fatcat:ptxih27fezbavg47nqil4w7qry

VUSFA:Variational Universal Successor Features Approximator to Improve Transfer DRL for Target Driven Visual Navigation [article]

Shamane Siriwardhana, Rivindu Weerasakera, Denys J.C. Matthies, Suranga Nanayakkara
2019 arXiv   pre-print
Specifically, we build on the concept of Universal Successor Features with an A3C agent.  ...  We introduce the novel architectural contribution of a Successor Feature Dependant Policy (SFDP) and adopt the concept of Variational Information Bottlenecks to achieve state of the art performance.  ...  D Extended Abstract on Universal Successor Feature Approximators Similar to the UVFA [Schaul et al., 2015] , a goal-dependant SF is approximated with Universal Successor Feature Approximators (USFA)  ... 
arXiv:1908.06376v1 fatcat:2lcjbdynvbetlcok6vbmacfh3i

State2vec: Off-Policy Successor Features Approximators [article]

Sephora Madjiheurem, Laura Toni
2019 arXiv   pre-print
This has been proposed in the literature as successor representation approximators.  ...  In this paper, we propose state2vec, an efficient and low-complexity framework for learning successor features which (i) generalize across policies, (ii) ensure sample-efficiency during meta-test.  ...  Borsa et al. (2019) 's universal successor features approximators (USFAs) exhibits two types of generalisations: one that exploits the structure in the underlying space of value functions and another that  ... 
arXiv:1910.10277v1 fatcat:25jtnsjuqrc5va6w26ylsj5jc4

Target Driven Visual Navigation with Hybrid Asynchronous Universal Successor Representations [article]

Shamane Siriwardhana, Rivindu Weerasekera, Suranga Nanayakkara
2018 arXiv   pre-print
In this paper, we present a novel approach, Hybrid Asynchronous Universal Successor Representations (HAUSR), which overcomes the problem of generalizability to new goals by adapting recent work on Universal  ...  Successor Representations with Asynchronous Actor-Critic Agents.  ...  Feature approximator.  ... 
arXiv:1811.11312v1 fatcat:zngzy4hozra6rp6xey7xkebhge

Universal Successor Representations for Transfer Reinforcement Learning [article]

Chen Ma, Junfeng Wen, Yoshua Bengio
2018 arXiv   pre-print
To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment  ...  ., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice.  ...  Unlike Schaul et al. (2015) , who factorized the general state values into state and goal features to facilitate learning, we propose to learn a universal approximator for successor representations (SR  ... 
arXiv:1804.03758v1 fatcat:pbmsb3y5tvhtfh6pclh5bpizaa

Policy Caches with Successor Features

Mark W. Nemecek, Ron Parr
2021 International Conference on Machine Learning  
For transfer between tasks which share transition dynamics but differ in reward function, successor features have been shown to be a useful representation which allows for efficient computation of action-value  ...  Universal Successor Feature Approximators (USFAs) (Borsa et al., 2019) offer the potential to have a parameterized set of successor features generalize across tasks.  ...  Our approach uses the actual approximate value function which is calculated efficiently using the stored successor feature approximators and the reward weights for the new task.  ... 
dblp:conf/icml/NemecekP21 fatcat:gglhi3mb7fcfppnipadlxvzfhm

Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments [article]

Jingwei Zhang, Jost Tobias Springenberg, Joschka Boedecker, Wolfram Burgard
2017 arXiv   pre-print
We propose a successor feature based deep reinforcement learning algorithm that can learn to transfer knowledge from previously mastered navigation tasks to new problem instances.  ...  successor features ψ(φ s , a * ; θ − ψ ) into Eq. (3) -and where θ − ψ denotes the parameters of the current target successor feature approximation.  ...  +1 . (4) And we can thus learn approximate successor features using a deep Q-learning like procedure [10] , [7] .  ... 
arXiv:1612.05533v3 fatcat:r2r4wp2rxnbzrnh24waa2o46hy

Learning One Representation to Optimize All Rewards [article]

Ahmed Touati, Yann Ollivier
2021 arXiv   pre-print
With imperfect training, the sub-optimality is proportional to the unsupervised approximation error.  ...  Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018.  ...  If the estimate (43) is used, the learned policies correspond to using universal successor features approximators [BBQ + 18] on top of the features learned by .  ... 
arXiv:2103.07945v3 fatcat:uhezy3scinavbiekimvsnmhicy

Successor Feature Sets: Generalizing Successor Representations Across Policies [article]

Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon
2021 arXiv   pre-print
To address these limitations, we bring together ideas from predictive state representations, belief space value iteration, successor features, and convex analysis: we develop a new, general successor-style  ...  However, we believe that future work will allow us to extend our ideas to approximate reasoning in large, unknown environments.  ...  Universal value function approximators. In International conference on machine learning, 1312-1320. Shani, G.; Pineau, J.; and Kaplow, R. 2013. A survey of point-based POMDP solvers.  ... 
arXiv:2103.02650v2 fatcat:zmrlehpym5copbw3j5rwmlp63a

Deep Successor Reinforcement Learning [article]

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
2016 arXiv   pre-print
There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map.  ...  The value function of a state can be computed as the inner product between the successor map and the reward weights.  ...  shared feature representation for both reward prediction and SR approximation.  ... 
arXiv:1606.02396v1 fatcat:st7mhic7azgebcdr75ahac5kpi

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [article]

Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
2021 arXiv   pre-print
To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies.  ...  We propose a multi-task inverse reinforcement learning (IRL) algorithm, called inverse temporal difference learning (ITD), that learns shared state features, alongside per-agent successor features and  ...  Let (r i ) k i=1 denote a set of reward functions on C,Ψ i be a collection of successor features approximations for policies (π i ) k i=1 (π i optimal for r i ) with true successor feature values Ψ i ,  ... 
arXiv:2102.12560v2 fatcat:iihuwyxiyvduvgc7em5hahxway

The Successor Representation: Its Computational Logic and Neural Substrates

Samuel J. Gershman
2018 Journal of Neuroscience  
Recent behavioral and neural studies have provided evidence for the successor representation, and computational studies have explored ways to extend the original idea.  ...  Following this logic leads to the idea of the successor representation, which encodes states of the environment in terms of their predictive relationships with other states.  ...  It is also possible to define a linear function approximator for the SR, in which case there is one error for each feature (Gardner et al., 2018) .  ... 
doi:10.1523/jneurosci.0151-18.2018 pmid:30006364 pmcid:PMC6096039 fatcat:bz4mz2euxjfj7e3omvkqylluue

Universal Successor Features for Transfer Reinforcement Learning [article]

Chen Ma, Dylan R. Ashley, Junfeng Wen, Yoshua Bengio
2020 arXiv   pre-print
In this paper, we propose (1) Universal Successor Features (USFs) to capture the underlying dynamics of the environment while allowing generalization to unseen goals and (2) a flexible end-to-end model  ...  However, successor features are believed to be more suitable than values for transfer (Dayan, 1993; Barreto et al.,2017), even though they cannot directly generalize to new goals.  ...  In this paper, we propose Universal Successor Features (USFs).  ... 
arXiv:2001.04025v1 fatcat:jogphbzzcba2xcwxzq5gic5nuy

The Biddy BDD package

Robert Meolic
2019 Journal of Open Source Software  
Acknowledgements The development of the Biddy BDD package and BDD Scout application so far was supported by University of Maribor, Faculty of Electrical Engineering and Computer Science.  ...  For a Reduced Ordered Binary Decision Diagram (ROBDD), each edge to internal node n with variable var(n), left successor else(n), and right successor then(n) corresponds to the Boolean function f (n) that  ...  and a positive literal is included if the path continues in the then successor.  ... 
doi:10.21105/joss.01189 fatcat:mx6mmqymfnfcdi6m5eajcdno5y

A New Representation of Successor Features for Transfer across Dissimilar Environments [article]

Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh
2021 arXiv   pre-print
To address this problem, we propose an approach based on successor features in which we model successor feature functions with Gaussian Processes permitting the source successor features to be treated  ...  as noisy measurements of the target successor feature function.  ...  Other works that have built upon successor features include generalised policy updates on successor features (Barreto et al., 2020) , a universal type of successor feature based on the temporal difference  ... 
arXiv:2107.08426v1 fatcat:64hwna3cuzbjhdgqwpkqj6xdky
« Previous Showing results 1 — 15 out of 54,094 results