Filters








119 Hits in 2.9 sec

VIME: Variational Information Maximizing Exploration [article]

Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
2017 arXiv   pre-print
This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics.  ...  We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse  ...  Conclusions We have proposed Variational Information Maximizing Exploration (VIME), a curiosity-driven exploration strategy for continuous control tasks.  ... 
arXiv:1605.09674v4 fatcat:lrwm2ssr7nb3dhrektnzymohuu

Information Maximizing Exploration with a Latent Dynamics Model [article]

Trevor Barron, Oliver Obst, Heni Ben Amor
2018 arXiv   pre-print
This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces.  ...  All reinforcement learning algorithms must handle the trade-off between exploration and exploitation.  ...  Incentivizing exploration with reward bonuses and intrinsic motivation In this work we focus on exploration and evaluate a method akin to Variational Information Maximizing Exploration (VIME) [Houthooft  ... 
arXiv:1804.01238v1 fatcat:wsba324bgfglha57e6bahwc2aa

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information [article]

Chen Qiu, Xuan Wang, Tianzi Ma, Yaojun Wen, Jiajia Zhang
2021 arXiv   pre-print
For uncertain scenarios like the cases under Reinforcement Learning (RL), variational information maximizing exploration (VIME) provides a useful framework for exploring environments using information  ...  By adding information gain to the reward, the average strategy calculated by CFR can be directly used as an interactive strategy, and the exploration efficiency of the algorithm to uncertain environments  ...  Variational Information Maximizing Exploration Variational information maximizing exploration (VIME) is an exploration strategy algorithm based on the maximization of information gain for uncertain environments  ... 
arXiv:2110.07892v1 fatcat:xopyt5kxmvhbbo4tlvwpoegsnu

A Bandit Framework for Optimal Selection of Reinforcement Learning Agents [article]

Andreas Merentitis, Kashif Rasul, Roland Vollgraf, Abdul-Saboor Sheikh, Urs Bergmann
2019 arXiv   pre-print
The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps.  ...  This surrogate reward is inspired by the Variational Information Maximizing Exploration idea [5] , where a similar metric, which captures the surprise of the agent regarding the environment dynamics,  ...  These surrogate rewards are inspired by the Variational Information Maximizing Exploration concept, where a metric capturing the surprise of an agent regarding the environment dynamics is used to promote  ... 
arXiv:1902.03657v1 fatcat:t4ubyb2hbbazzmgaftd6gd3jxi

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning [article]

Joshua Achiam, Shankar Sastry
2017 arXiv   pre-print
Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation.  ...  Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards.  ...  ACKNOWLEDGEMENTS We thank Rein Houthooft for interesting discussions and for sharing data from the original VIME experiments.  ... 
arXiv:1703.01732v1 fatcat:5e5xx4k5m5bltgir5wts73zvnm

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems [article]

Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng
2017 arXiv   pre-print
Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network.  ...  We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.  ...  BBQN with intrinsic reward Variational Information Maximizing Exploration (VIME) (Houthooft et al. 2016a) introduces an exploration strategy based on maximizing the information gain about the agent's  ... 
arXiv:1608.05081v4 fatcat:3meqci2hnbekdjspzwfw4nplbq

Considerations surrounding remote medicolegal assessments: a systematic search and narrative synthesis of the range of motion literature

Peter Steadman, Dianne Sheppard, Janette Henderson, Brett Halliday, Ian Freckelton
2021 ANZ journal of surgery  
To explore this, a systematic literature search focusing on advanced device-based range of motion measurement was conducted, along with an historical snapshot of observation-based range of motion measurement  ...  examinations with limited clinical assessment have utility for legal matters, such as the assessment of causation of injury, treatment advice or approvals and fitness for pre-employment tasks or safe variations  ...  We have identified the specific circumstances that are likely to maximize the accuracy and reliability of ROM measurement in the vIME setting.  ... 
doi:10.1111/ans.16841 pmid:33890724 fatcat:cgt4qiaw7rbjdkb2edn6tylbsu

Bayesian Curiosity for Efficient Exploration in Reinforcement Learning [article]

Tom Blau, Lionel Ott, Fabio Ramos
2019 arXiv   pre-print
Balancing exploration and exploitation is a fundamental part of reinforcement learning, yet most state-of-the-art algorithms use a naive exploration protocol like ϵ-greedy.  ...  This contributes to the problem of high sample complexity, as the algorithm wastes effort by repeatedly visiting parts of the state space that have already been explored.  ...  Such a reward signal can be derived from visitation counts [12] , [13] , [14] , model prediction error [15] , [16] , [17] , variational information gain [18] , or entropy maximization [19] , [  ... 
arXiv:1911.08701v1 fatcat:yj5dcs45bvf57n56j5mkytqpnm

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems [article]

Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng
2017 arXiv   pre-print
Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network.  ...  We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.  ...  BBQN with intrinsic reward Variational Information Maximizing Exploration (VIME) (Houthooft et al. 2016a) introduces an exploration strategy based on maximizing the information gain about the agent's  ... 
arXiv:1711.05715v2 fatcat:bfswvf466fdnhaoxec2asstetm

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings [article]

John D. Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine
2018 arXiv   pre-print
Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories.  ...  We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective.  ...  Gregor et al. (2016) aims to learn a maximally discriminative set of options by maximizing the mutual information between the final state reached by each of the options and the latent representation.  ... 
arXiv:1806.02813v1 fatcat:3zznottwerd2xgse4iqznyqrxq

Curiosity-Driven Exploration via Latent Bayesian Surprise [article]

Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt
2022 arXiv   pre-print
With the aid of artificial curiosity, we could equip current techniques for control, such as Reinforcement Learning, with more natural exploration capabilities.  ...  A promising approach in this respect has consisted of using Bayesian surprise on model parameters, i.e. a metric for the difference between prior and posterior beliefs, to favour exploration.  ...  Information Maximizing Exploration (VIME; Houthooft et al. (2016) ): the dynamics is modeled as a Bayesian neural network (BNN; Bishop (1997) ).  ... 
arXiv:2104.07495v2 fatcat:omx4pv5g7fgthhwss7wgfezeka

MIME: Mutual Information Minimisation Exploration [article]

Haitao Xu and Brendan McCane and Lech Szymanski and Craig Atkinson
2020 arXiv   pre-print
We propose a counter-intuitive solution that we call Mutual Information Minimising Exploration (MIME) where an agent learns a latent representation of the environment without trying to predict the future  ...  [8] proposed VIME, which computes Bayesian-surprisal inspired by the idea of maximising information gain. But VIME is difficult to scale up to large-scale environments [1] .  ...  We propose Mutual Information Minimising Exploration (MIME) in this paper.  ... 
arXiv:2001.05636v1 fatcat:3c4rit5pznbelk4kdikk3de47y

Mutual Information State Intrinsic Control [article]

Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
2021 arXiv   pre-print
We mathematically formalize this reward as the mutual information between the agent state and the surrounding state under the current agent policy.  ...  Information Maximizing Exploration (VIME) (Houthooft et al., 2016) .  ...  Compared to the variational information maximizing-based approaches (Barber & Agakov, 2003; Alemi et al., 2016; Chalk et al., 2016; Kolchinsky et al., 2017) , the recent MINE-based approaches have shown  ... 
arXiv:2103.08107v1 fatcat:wes56q7epbddnbcrzisd3ueebm

Diversity is All You Need: Learning Skills without a Reward Function [article]

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
2018 arXiv   pre-print
Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy.  ...  Intelligent creatures can explore their environments and learn useful skills without supervision.  ...  We formalize our discriminability goal as maximizing an information theoretic objective with a maximum entropy policy.  ... 
arXiv:1802.06070v6 fatcat:giahsx3wjbhkteblz75rsidnei

EMI: Exploration with Mutual Information [article]

Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song
2019 arXiv   pre-print
that can be used to guide exploration based on forward prediction in the representation space.  ...  In these cases, naive random exploration methods essentially rely on a random walk to stumble onto a rewarding state.  ...  Acknowledgements This work was partially supported by Samsung Advanced Institute of Technology and Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the  ... 
arXiv:1810.01176v6 fatcat:yxhzi7jk6fda7jkp22hnfteeda
« Previous Showing results 1 — 15 out of 119 results