Filters








11,856 Hits in 2.7 sec

Learning Stochastic Shortest Path with Linear Function Approximation [article]

Yifei Min and Jiafan He and Tianhao Wang and Quanquan Gu
2022 arXiv   pre-print
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models.  ...  To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP.  ...  Introduction The Stochastic Shortest Path (SSP) model refers to a type of reinforcement learning (RL) problems where an agent repeatedly interacts with a stochastic environment and aims to reach some specific  ... 
arXiv:2110.12727v2 fatcat:lt63whdzh5gdbmnuavjhylyfte

Geometrically Enriched Latent Spaces [article]

Georgios Arvanitidis, Søren Hauberg, Bernhard Schölkopf
2020 arXiv   pre-print
Shortest paths can then be defined accordingly in the latent space to both follow the learned manifold and respect the ambient geometry.  ...  Experimentally we show that our approach improves interpretability of learned representations both using stochastic and deterministic generators.  ...  Hence, the shortest paths computed in X are able to approximate closely the true paths on the actual data manifold M ⊂ X , as long as the linear projection step does not change the structure of M in X  ... 
arXiv:2008.00565v1 fatcat:7aaieueko5aj7arbimnfge7soi

Efficient computation of optimal actions

Emanuel Todorov
2009 Proceedings of the National Academy of Sciences of the United States of America  
Our framework may have similar impact in fields where optimal choice of actions is relevant. action selection | cost function | linear Bellman equation | stochastic optimal control This article contains  ...  This yields algorithms that outperform Dynamic Programming and Reinforcement Learning, and thereby solve traditional problems more efficiently.  ...  Both Z learning and Q learning aim to approximate this function.  ... 
doi:10.1073/pnas.0710743106 pmid:19574462 pmcid:PMC2705278 fatcat:q7bypwgoifbdpj5bva7uvlidvm

Online shortest paths with confidence intervals for routing in a time varying random network [article]

Stéphane Chrétien, Christophe Guyeux
2018 arXiv   pre-print
Our improvement enables to find a confidence interval for the shortest path, by using the stochastic gradient algorithm for approximate Bayesian inference.  ...  In this article, an online shortest path computation using stochastic gradient descent is proposed. This routing algorithm for ITS traffic management is based on the online Frank-Wolfe approach.  ...  The online approach to the stochastic shortest path problem had long been interesting to the machine learning commmunity.  ... 
arXiv:1805.09261v1 fatcat:rn5hwage7vefhm54wylxnbwdcu

Online Shortest Paths With Confidence Intervals for Routing in a Time Varying Random Network

Stephane Chretien, Christophe Guyeux
2018 2018 International Joint Conference on Neural Networks (IJCNN)  
Our improvement enables to find a confidence interval for the shortest path, by using the stochastic gradient algorithm for approximate Bayesian inference.  ...  In this article, an online shortest path computation using stochastic gradient descent is proposed. This routing algorithm for ITS traffic management is based on the online Frank-Wolfe approach.  ...  The online approach to the stochastic shortest path problem had long been interesting to the machine learning commmunity.  ... 
doi:10.1109/ijcnn.2018.8489447 dblp:conf/ijcnn/ChretienG18 fatcat:l36ztjkelfanxlu2grop54gr4m

Reward-Respecting Subtasks for Model-Based Reinforcement Learning [article]

Richard S. Sutton and Marlos C. Machado and G. Zacharias Holland and David Szepesvari and Finbarr Timbers and Brian Tanner and Adam White
2022 arXiv   pre-print
Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.  ...  To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time.  ...  In the current work we are particularly concerned with linear approximations to v π .  ... 
arXiv:2202.03466v2 fatcat:vwilnxtbdfcgxmkzzz64hw7wd4

Page 1386 of Mathematical Reviews Vol. , Issue 2001B [page]

2001 Mathematical Reviews  
There are n(n — 1) shortest paths in the network with |V| =n, so among all these shortest paths the SPCP requires us to count the number of shortest paths passing each edge.  ...  Summary: “In this paper we consider the shortest path counting problem (SPCP): How many shortest paths contain each edge of a network N = (VE) with a vertex set V and an edge set E?  ... 

Regular Policies in Abstract Dynamic Programming

Dimitri P. Bertsekas
2017 SIAM Journal on Optimization  
My collaboration with John Tsitsiklis on stochastic shortest path problems provided inspiration for the work on semicontractive models.  ...  Included are stochastic shortest path problems, search problems, linear-quadratic problems, a host of queueing problems, multiplicative and exponential cost models, and others.  ... 
doi:10.1137/16m1090946 fatcat:iiaplphuczd3xjttwjo3cet3yq

Page 6332 of Mathematical Reviews Vol. , Issue 91K [page]

1991 Mathematical Reviews  
In addition we analyze the fuzzy shortest path algorithms in terms of submodular functions.  ...  Summary: “For sequential stochastic decision problems on a par- tially observable Markov process, several properties under Bayesian learning procedure are considered.  ... 

Learning with Differentiable Perturbed Optimizers [article]

Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
2020 arXiv   pre-print
Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths).  ...  Our approach relies on stochastically perturbed optimizers, and can be used readily together with existing solvers.  ...  beam-search, or with shortest paths problems).  ... 
arXiv:2002.08676v2 fatcat:vzbo4ndqibbfvnvbmvhphvkrzy

Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases

Norberto M. Grzywacz
2021 Frontiers in Human Neuroscience  
Here, we analyze the learning performance with models including optimal nonlinear value functions.  ...  This linearity meant that the learning process employed a value function that assumed a linear relationship between reward and sensory stimuli.  ...  ACKNOWLEDGMENTS We thank Hassan Aleem, Ivan Correa-Herran, Maria Pombo, and Jiaan Mansuri for stimulating discussions with us on theoretical and experimental principles of aesthetic biases.  ... 
doi:10.3389/fnhum.2021.639081 pmid:34040509 pmcid:PMC8141583 fatcat:dea345pilvf7fh6nw3qtoipy3y

A tutorial on recursive models for analyzing and predicting path choice behavior [article]

Maëlle Zimmermann, Emma Frejinger
2020 arXiv   pre-print
Second, we formally introduce the problem and the recursive modeling idea along with an overview of existing models, their properties and applications.  ...  The problem at the heart of this tutorial consists in modeling the path choice behavior of network users.  ...  conveniently solved as a system of linear equations without approximation.  ... 
arXiv:1905.00883v2 fatcat:gdkumr2w4bayrdjz4hl2d3qcli

Stochastic single machine scheduling problem as a multi-stage dynamic random decision process

Mina Roohnavazfar, Daniele Manerba, Lohic Fotio Tiotsop, Seyed Hamid Reza Pasandideh, Roberto Tadei
2021 Computational Management Science  
We discuss and compare the results found by the resolution of plain stochastic models with those obtained by the deterministic approximation approach.  ...  Then, to efficiently solve the problem, a new accessibility measure is defined to convert the model into the search of a shortest path throughout the stages.  ...  Edoardo Fadda from Dept. of Control and Computer Engineering of Politecnico di Torino (Italy) for his support in the calibration method proposed for the parameters of our approximation approach.  ... 
doi:10.1007/s10287-020-00386-1 fatcat:5udapx3gerdjfhqy7yfskuajee

Page 274 of Neural Computation Vol. 7, Issue 2 [page]

1995 Neural Computation  
It can be seen that TD(0) can yield a very poor approximation to the cost function. The above example can be generalized with similar results.  ...  These methods have the advantage that they apply to discounted Markovian decision problems and stochastic shortest path problems (as defined in Bertsekas and Tsitsiklis 1989), where there are multiple  ... 

USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems [article]

Guangmo Tong
2022 arXiv   pre-print
samples of input-solution pairs -- without the need to learn the objective function.  ...  Therefore, we are frequently confronted with combinatorial optimization problems of which the objective function is unknown and thus has to be debunked using empirical evidence.  ...  In the stochastic version of the shortest path problem, given a source u and a destination v in a graph G = (V, E), we wish to find the path (from u to v) that is the shortest in terms of a distribution  ... 
arXiv:2107.07508v3 fatcat:przlwuabgrad3giwsqfpl2azs4
« Previous Showing results 1 — 15 out of 11,856 results