7,003 Hits in 2.3 sec

Hindsight Logging for Model Training [article]

Rolando Garcia, Eric Liu, Vikram Sreekanti, Bobby Yan, Anusha Dandamudi, Joseph E. Gonzalez, Joseph M. Hellerstein, Koushik Sen
2020 arXiv   pre-print
In this paper, we present hindsight logging, a novel technique for efficiently querying ad-hoc execution data, long after model training.  ...  We implement these ideas in Flor, a record-replay system for hindsight logging in Python.  ...  Figure 2 : 2 PyTorch model training exampleautomatically parallellizes the re-execution of model training for hindsight logging, achieving near-ideal parallelism and scale-out to multiple machines.  ... 
arXiv:2006.07357v1 fatcat:fpwmpcjom5bj7e2axotgnoo5xu

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment [article]

Jiaming Guo, Rui Zhang, Xishan Zhang, Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
2021 arXiv   pre-print
Compared with the standard state value function, the proposed hindsight value function consistently reduces the variance, stabilizes the training, and improves the eventual policy.  ...  In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for  ...  Figure 1 : 1 Model architecture of hindsight value function. ) 10: end for 11: Variantional contrastive log-ratio upper boundI vCLU B = 1 N N i=1 U i 12: for i = 1 to N do 13: L i = P θp (h i , (s i  ... 
arXiv:2107.12216v2 fatcat:vygbpu2ybbhwrdf22l4jkm27ti

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning [article]

Yunhao Tang, Alp Kucukelbir
2021 arXiv   pre-print
We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective.  ...  The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards.  ...  , log p(O = 1) ≥ E q(τ ) [log p(O = 1 | τ )] − KL [q(τ ) p θ (τ )] =: L(π θ , q). (3) Figure 2 (b) shows a combined graphical model for both the generative and inference models of Variational rl.  ... 
arXiv:2006.07549v2 fatcat:p5bpmz7ktjdhnpuwjr3vmig7la

Towards Practical Credit Assignment for Deep Reinforcement Learning [article]

Vyacheslav Alipov, Riley Simmons-Edler, Nikita Putintsev, Pavel Kalinin, Dmitry Vetrov
2022 arXiv   pre-print
Explicit credit assignment methods have the potential to boost the performance of RL algorithms on many tasks, but thus far remain impractical for general use.  ...  Recently, a family of methods called Hindsight Credit Assignment (HCA) was proposed, which explicitly assign credit to actions in hindsight based on the probability of the action having led to an observed  ...  We identify the following reasons for Deep HCA's poor performance: • Slow training of the hindsight distribution h φ (a|s t , s k ) is a bottleneck for policy training. • An imperfect and biased hindsight  ... 
arXiv:2106.04499v2 fatcat:asx6zhiwsnazjlqh4zev2k7cey

Decoupled IoU Regression for Object Detection [article]

Yan Gao and Qimeng Wang and Xu Tang and Haochen Wang and Fei Ding and Jing Li and Yao Hu
2022 arXiv   pre-print
In this paper, we propose a novel Decoupled IoU Regression (DIR) model to handle these problems.  ...  Non-maximum suppression (NMS) is widely used in object detection pipelines for removing duplicated bounding boxes.  ...  The loss functions for Purity, Integrity and IoU are: 𝐿 𝑃𝑢𝑟𝑖 = − 1 𝑁 𝑁 ∑︁ 𝑖=1 𝑃𝑢𝑟𝑖𝑡𝑦 * (𝑏 𝑖 ) • log(𝑠 𝑖 ) + (1 − 𝑃𝑢𝑟𝑖𝑡𝑦 * (𝑏 𝑖 )) • log(1 − 𝑠 𝑖 ) 𝐿 𝐼𝑛𝑡𝑒 = − 1 𝑁 𝑁 ∑︁  ... 
arXiv:2202.00866v1 fatcat:3qgvemalejgcdaozl5bsybbr7a

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning [article]

Yannick Schroecker, Charles Isbell
2020 arXiv   pre-print
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in stochastic domains.  ...  We find that this is true for training long-term generative models as well.  ...  Truncating the time horizon: The training data for learning a long-term model can be fairly noisy.  ... 
arXiv:2002.06473v1 fatcat:6urnmvioenccnibkocmixtlqmu

Counterfactual Credit Assignment in Model-Free Reinforcement Learning [article]

Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter (+2 others)
2021 arXiv   pre-print
To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup.  ...  To avoid the potential bias from conditioning on future information, we constrain the hindsight information to not contain information about the agent's actions.  ...  It is trained to learn the residual between the return and the forward baseline. • For CCA, the hindsight classifier h ω is computed as the sum of the log of the policy outputs and the output of an MLP  ... 
arXiv:2011.09464v2 fatcat:2ygufffdybeglc4purz4ppocxi

Soft Hindsight Experience Replay [article]

Qiwei He, Liansheng Zhuang, Houqiang Li
2020 arXiv   pre-print
reuse and maximum entropy probabilistic inference model.  ...  In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution.  ...  To show this in some way, we propose the formula δS = |S train − S test | (17) to measure the stability of different goals for three algorithms, where S train and S test stand for the success rate of training  ... 
arXiv:2002.02089v1 fatcat:xgohvzozbffbpnzuze2fhhturm

Inverse Reinforcement Learning with Natural Language Goals [article]

Li Zhou, Kevin Small
2020 arXiv   pre-print
Ideally, natural language should also be usable for communicating goals to autonomous machines (e.g., robots) to minimize friction in task specification.  ...  To improve generalization of the learned policy and reward function, we use a variational goal generator to relabel trajectories and sample diverse goals during training.  ...  For the purpose of simplicity, we pre-train the LSTM model using the base model below and fix its parameters during training.  ... 
arXiv:2008.06924v3 fatcat:ivmrxwwkvfecdmxf2fnjyebj6y

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Jürgen Schmidhuber
2021 Neural Computation  
However, reinforcement learning agents have only recently been endowed with such capacity for hindsight.  ...  Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.  ...  We are grateful to Nvidia Corporation for donating a DGX-1 machine and to IBM for donating a Minsky machine.  ... 
doi:10.1162/neco_a_01387 pmid:34496391 fatcat:xgf75zm5o5blloexd4zjtoaiua

Hindsight policy gradients [article]

Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Juergen Schmidhuber
2019 arXiv   pre-print
However, reinforcement learning agents have only recently been endowed with such capacity for hindsight.  ...  Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.  ...  We are grateful to Nvidia Corporation for donating a DGX-1 machine and to IBM for donating a Minsky machine.  ... 
arXiv:1711.06006v3 fatcat:sti6rfl6k5ebhbt52zvlwic3c4

Hindsight Credit Assignment [article]

Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos
2019 arXiv   pre-print
This approach uses new information in hindsight, rather than employing foresight.  ...  Acknowledgements The authors thank Joseph Modayil for reviews of earlier manuscripts, Theo Weber for several insightful suggestions, and the anonymous reviewers for their useful feedback.  ...  Then, we will discuss the training of the relevant hindsight distributions.  ... 
arXiv:1912.02503v1 fatcat:jaufpb2dobgl5igp7exzm5u2su

Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics [article]

Swagat Kumar, Hayden Sampson, Ardhendu Behera
2022 arXiv   pre-print
A number of strategies are suggested to provide intermediate hindsight goals required for implementing HER algorithm on these problems which are essentially single-goal environments.  ...  The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight Experience replay (HER  ...  The data intensive nature of reinforcement learning necessitates use of simulated environments for training models.  ... 
arXiv:2201.04224v1 fatcat:se4mafdttncovgbpv6abz44ctq

Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning [article]

Ning Wei, Jiahua Liang, Di Xie, Shiliang Pu
2021 arXiv   pre-print
To this end, we propose a hindsight reward tweaking approach by designing a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space.  ...  We simply extend the input observation with a condition vector linearly correlated with the effective environment reward parameters and train the model in a conventional manner except for randomizing reward  ...  ; c) baseline models are trained 3×10 4 more agent steps than cDRL models to compensate their extra exposure to environments during hindsight optimization.  ... 
arXiv:2109.02332v1 fatcat:t35oppqkobhongiqwreixs53hm

Episodic Self-Imitation Learning with Hindsight

Tianhong Dai, Hengyan Liu, Anil Anthony Bharath
2020 Electronics  
The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences.  ...  Compared to the original self-imitation learning algorithm, which samples good state–action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation  ...  The models are trained on a machine with an Intel i7-5960X CPU and 64GB RAM.  ... 
doi:10.3390/electronics9101742 fatcat:rxpcgcsn2zcstfugpzjdzqcsre
« Previous Showing results 1 — 15 out of 7,003 results