A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
VIME: Variational Information Maximizing Exploration
[article]
2017
arXiv
pre-print
This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. ...
We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse ...
Conclusions We have proposed Variational Information Maximizing Exploration (VIME), a curiosity-driven exploration strategy for continuous control tasks. ...
arXiv:1605.09674v4
fatcat:lrwm2ssr7nb3dhrektnzymohuu
Information Maximizing Exploration with a Latent Dynamics Model
[article]
2018
arXiv
pre-print
This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. ...
All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. ...
Incentivizing exploration with reward bonuses and intrinsic motivation In this work we focus on exploration and evaluate a method akin to Variational Information Maximizing Exploration (VIME) [Houthooft ...
arXiv:1804.01238v1
fatcat:wsba324bgfglha57e6bahwc2aa
Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information
[article]
2021
arXiv
pre-print
For uncertain scenarios like the cases under Reinforcement Learning (RL), variational information maximizing exploration (VIME) provides a useful framework for exploring environments using information ...
By adding information gain to the reward, the average strategy calculated by CFR can be directly used as an interactive strategy, and the exploration efficiency of the algorithm to uncertain environments ...
Variational Information Maximizing Exploration Variational information maximizing exploration (VIME) is an exploration strategy algorithm based on the maximization of information gain for uncertain environments ...
arXiv:2110.07892v1
fatcat:xopyt5kxmvhbbo4tlvwpoegsnu
A Bandit Framework for Optimal Selection of Reinforcement Learning Agents
[article]
2019
arXiv
pre-print
The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps. ...
This surrogate reward is inspired by the Variational Information Maximizing Exploration idea [5] , where a similar metric, which captures the surprise of the agent regarding the environment dynamics, ...
These surrogate rewards are inspired by the Variational Information Maximizing Exploration concept, where a metric capturing the surprise of an agent regarding the environment dynamics is used to promote ...
arXiv:1902.03657v1
fatcat:t4ubyb2hbbazzmgaftd6gd3jxi
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning
[article]
2017
arXiv
pre-print
Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. ...
Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. ...
ACKNOWLEDGEMENTS We thank Rein Houthooft for interesting discussions and for sharing data from the original VIME experiments. ...
arXiv:1703.01732v1
fatcat:5e5xx4k5m5bltgir5wts73zvnm
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
[article]
2017
arXiv
pre-print
Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. ...
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. ...
BBQN with intrinsic reward Variational Information Maximizing Exploration (VIME) (Houthooft et al. 2016a) introduces an exploration strategy based on maximizing the information gain about the agent's ...
arXiv:1608.05081v4
fatcat:3meqci2hnbekdjspzwfw4nplbq
Considerations surrounding remote medicolegal assessments: a systematic search and narrative synthesis of the range of motion literature
2021
ANZ journal of surgery
To explore this, a systematic literature search focusing on advanced device-based range of motion measurement was conducted, along with an historical snapshot of observation-based range of motion measurement ...
examinations with limited clinical assessment have utility for legal matters, such as the assessment of causation of injury, treatment advice or approvals and fitness for pre-employment tasks or safe variations ...
We have identified the specific circumstances that are likely to maximize the accuracy and reliability of ROM measurement in the vIME setting. ...
doi:10.1111/ans.16841
pmid:33890724
fatcat:cgt4qiaw7rbjdkb2edn6tylbsu
Bayesian Curiosity for Efficient Exploration in Reinforcement Learning
[article]
2019
arXiv
pre-print
Balancing exploration and exploitation is a fundamental part of reinforcement learning, yet most state-of-the-art algorithms use a naive exploration protocol like ϵ-greedy. ...
This contributes to the problem of high sample complexity, as the algorithm wastes effort by repeatedly visiting parts of the state space that have already been explored. ...
Such a reward signal can be derived from visitation counts [12] , [13] , [14] , model prediction error [15] , [16] , [17] , variational information gain [18] , or entropy maximization [19] , [ ...
arXiv:1911.08701v1
fatcat:yj5dcs45bvf57n56j5mkytqpnm
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
[article]
2017
arXiv
pre-print
Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. ...
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. ...
BBQN with intrinsic reward Variational Information Maximizing Exploration (VIME) (Houthooft et al. 2016a) introduces an exploration strategy based on maximizing the information gain about the agent's ...
arXiv:1711.05715v2
fatcat:bfswvf466fdnhaoxec2asstetm
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
[article]
2018
arXiv
pre-print
Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. ...
We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. ...
Gregor et al. (2016) aims to learn a maximally discriminative set of options by maximizing the mutual information between the final state reached by each of the options and the latent representation. ...
arXiv:1806.02813v1
fatcat:3zznottwerd2xgse4iqznyqrxq
Curiosity-Driven Exploration via Latent Bayesian Surprise
[article]
2022
arXiv
pre-print
With the aid of artificial curiosity, we could equip current techniques for control, such as Reinforcement Learning, with more natural exploration capabilities. ...
A promising approach in this respect has consisted of using Bayesian surprise on model parameters, i.e. a metric for the difference between prior and posterior beliefs, to favour exploration. ...
Information Maximizing Exploration (VIME; Houthooft et al. (2016) ): the dynamics is modeled as a Bayesian neural network (BNN; Bishop (1997) ). ...
arXiv:2104.07495v2
fatcat:omx4pv5g7fgthhwss7wgfezeka
MIME: Mutual Information Minimisation Exploration
[article]
2020
arXiv
pre-print
We propose a counter-intuitive solution that we call Mutual Information Minimising Exploration (MIME) where an agent learns a latent representation of the environment without trying to predict the future ...
[8] proposed VIME, which computes Bayesian-surprisal inspired by the idea of maximising information gain. But VIME is difficult to scale up to large-scale environments [1] . ...
We propose Mutual Information Minimising Exploration (MIME) in this paper. ...
arXiv:2001.05636v1
fatcat:3c4rit5pznbelk4kdikk3de47y
Mutual Information State Intrinsic Control
[article]
2021
arXiv
pre-print
We mathematically formalize this reward as the mutual information between the agent state and the surrounding state under the current agent policy. ...
Information Maximizing Exploration (VIME) (Houthooft et al., 2016) . ...
Compared to the variational information maximizing-based approaches (Barber & Agakov, 2003; Alemi et al., 2016; Chalk et al., 2016; Kolchinsky et al., 2017) , the recent MINE-based approaches have shown ...
arXiv:2103.08107v1
fatcat:wes56q7epbddnbcrzisd3ueebm
Diversity is All You Need: Learning Skills without a Reward Function
[article]
2018
arXiv
pre-print
Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. ...
Intelligent creatures can explore their environments and learn useful skills without supervision. ...
We formalize our discriminability goal as maximizing an information theoretic objective with a maximum entropy policy. ...
arXiv:1802.06070v6
fatcat:giahsx3wjbhkteblz75rsidnei
EMI: Exploration with Mutual Information
[article]
2019
arXiv
pre-print
that can be used to guide exploration based on forward prediction in the representation space. ...
In these cases, naive random exploration methods essentially rely on a random walk to stumble onto a rewarding state. ...
Acknowledgements This work was partially supported by Samsung Advanced Institute of Technology and Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the ...
arXiv:1810.01176v6
fatcat:yxhzi7jk6fda7jkp22hnfteeda
« Previous
Showing results 1 — 15 out of 119 results