Filters








2,348,572 Hits in 3.5 sec

Value Pursuit Iteration

Amir Massoud Farahmand, Doina Precup
2012 Neural Information Processing Systems  
Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning problems with large state spaces.  ...  Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary.  ...  In the exact Value Iteration, Q k → Q * exponentially fast.  ... 
dblp:conf/nips/FarahmandP12 fatcat:f2tgooj3sfb6xjr7eyxrkqhdwm

Value Iteration Networks

Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within.  ...  Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.We  ...  Value Iteration Networks We now have all the ingredients for a differentiable planningbased policy, which we term a value iteration network (VIN).  ... 
doi:10.24963/ijcai.2017/700 dblp:conf/ijcai/TamarWTLA17 fatcat:6jyro4czujelbdar6ru4xct2tm

Optimistic Value Iteration [chapter]

Arnd Hartmanns, Benjamin Lucien Kaminski
2020 Lecture Notes in Computer Science  
The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards.  ...  We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards.  ...  Value Iteration The standard algorithm to compute reachability probabilities and expected rewards is value iteration (VI) [30] .  ... 
doi:10.1007/978-3-030-53291-8_26 fatcat:fjoe3ibrgfdpxboytbv4cn5sse

Optimistic Value Iteration [article]

Arnd Hartmanns, Benjamin Lucien Kaminski
2019 arXiv   pre-print
In this paper, we present optimistic value iteration, a new sound approach that leverages value iteration's ability to usually deliver tight lower bounds: we obtain a lower bound via standard value iteration  ...  The standard analysis algorithm, value iteration, only provides a lower bound on unbounded probabilities or reward values.  ...  Value Iteration The standard algorithm to compute reachability probabilities and expected rewards is value iteration (VI) [24] .  ... 
arXiv:1910.01100v2 fatcat:zgmwmf5upvft3m77qg6aey3f5e

Value Iteration Networks [article]

Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
2017 arXiv   pre-print
We introduce the value iteration network (VIN): a fully differentiable neural network with a 'planning module' embedded within.  ...  Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation  ...  Value Iteration Networks We now have all the ingredients for a differentiable planning-based policy, which we term a value iteration network (VIN).  ... 
arXiv:1602.02867v4 fatcat:dqzeywmevfdu5jotle6olnayr4

Sound Value Iteration [chapter]

Tim Quatmann, Joost-Pieter Katoen
2018 Lecture Notes in Computer Science  
All model checkers compute these probabilities in an iterative fashion using value iteration.  ...  These procedures require starting values for both sides. We present an alternative that does not require the a priori computation of starting vectors and that converges faster on many benchmarks.  ...  We illustrate that the same idea can be applied to expected rewards, topological value iteration [14] , and Gauss-Seidel value iteration.  ... 
doi:10.1007/978-3-319-96145-3_37 fatcat:y3brr35mu5b63fjxqufwhbjkfm

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions [article]

Tian Tian, Kenny Young, Richard S. Sutton
2022 arXiv   pre-print
Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning.  ...  To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions.  ...  As long as all states are sampled infinitely often, the value iterates must converge to v * .  ... 
arXiv:2207.01613v1 fatcat:qlgloohi7jfchp2uhvascseizu

Transfer Value Iteration Networks

Junyi Shen, Hankz Hankui Zhuo, Jin Xu, Bin Zhong, Sinno Pan
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains.  ...  Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and  ...  of the current state after K iterations of value iteration.  ... 
doi:10.1609/aaai.v34i04.6022 fatcat:wanhc424efaancb5aa32b2dmwu

Transfer Value Iteration Networks [article]

Junyi Shen, Hankz Hankui Zhuo, Jin Xu, Bin Zhong, Sinno Jialin Pan
2019 arXiv   pre-print
Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains.  ...  Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and  ...  of the current state after K iterations of value iteration.  ... 
arXiv:1911.05701v2 fatcat:b4bgpb424bf33hjw7cyyzgryai

Empirical Q-Value Iteration [article]

Dileep Kalathil, Vivek S. Borkar, Rahul Jain
2019 arXiv   pre-print
We show that our algorithm, which we call the empirical Q-value iteration (EQVI) algorithm, converges to the optimal Q-value function.  ...  We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown.  ...  Another way to find the optimal value function is via Q-value iteration.  ... 
arXiv:1412.0180v3 fatcat:lvaaj2jhrfbnnf5vzhnsyizzcq

Factored Value Iteration Converges [article]

Istvan Szita, Andras Lorincz
2008 arXiv   pre-print
The traditional approximate value iteration algorithm is modified in two ways.  ...  In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs).  ...  One iteration costs O(N 2 · |A|) computation steps. 2.3. Approximate value iteration.  ... 
arXiv:0801.2069v2 fatcat:tbairbrjqzah3jqzui2slekijy

A First-Order Approach To Accelerated Value Iteration [article]

Vineet Goyal, Julien Grand-Clement
2021 arXiv   pre-print
We introduce a Safe Accelerated Value Iteration (S-AVI), which alternates between accelerated updates and value iteration updates.  ...  Several efficient algorithms to compute optimal policies have been studied in the literature, including value iteration (VI) and policy iteration.  ...  Accelerated Value Iteration.  ... 
arXiv:1905.09963v7 fatcat:bnqh2kyhxzejdgphrwfff6nlkq

Analyzing Approximate Value Iteration Algorithms [article]

Arunselvan Ramaswamy, Shalabh Bhatnagar
2021 arXiv   pre-print
In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available.  ...  We call this counterpart as the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman's curse of dimensionality.  ...  Approximate value iteration methods The main aim of value iteration methods is to find a fixed point of the Bellman operator.  ... 
arXiv:1709.04673v5 fatcat:2cefogbubjhqrieleqdyba2jve

Value iteration is optic composition [article]

Jules Hedges, Riu Rodríguez Sakamoto
2022 arXiv   pre-print
In this paper, we show that value improvement, one of the main steps of dynamic programming, can be naturally seen as composition in a category of optics, and intuitively, the optimal value function is  ...  Two classical algorithms use these two steps differently: Policy iteration iterates value improvement until the current policy value is optimal before performing a policy improvement step, and value iteration  ...  Figure 1 : 1 Figure 1: Difference between policy iteration (above) and value iteration (below).  ... 
arXiv:2206.04547v1 fatcat:h6c5sfudbjbavbnfoh6xa2tsdq

Mean value methods in iteration

W. Robert Mann
1953 Proceedings of the American Mathematical Society  
We shall consider iteration from the limited but nevertheless important point of view of an applied mathematician trying to use a method of successive approximations on some boundary value problem which  ...  Due largely to the works of Cesàro, Fejér, and Toeplitz, mean value methods have become famous in the summation of divergent series.  ...  We shall consider iteration from the limited but nevertheless important point of view of an applied mathematician trying to use a method of successive approximations on some boundary value problem which  ... 
doi:10.1090/s0002-9939-1953-0054846-3 fatcat:sox57wismjecrpcsg657tybs7u
« Previous Showing results 1 — 15 out of 2,348,572 results