A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
2016
arXiv
pre-print
In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). ...
Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. ...
Specifically, we consider the problem of exploring a Markov decision process (MDP), where it is a priori unknown which state-action pairs are safe. ...
arXiv:1606.04753v2
fatcat:rbi4n4eruva6le7j42ubzob54q
Temporal Logic Control of POMDPs via Label-based Stochastic Simulation Relations
2018
IFAC-PapersOnLine
The synthesis of controllers guaranteeing linear temporal logic specifications on partially observable Markov decision processes (POMDP) via their belief models causes computational issues due to the continuous ...
In this work, we construct a finite-state abstraction on which a control policy is synthesized and refined back to the original belief model. ...
For finite-state partially observable Markov decision processes (POMDPs), verification and policy synthesis has been considered for PCTL properties (Norman et al., 2017; Chatterjee et al., 2015) . ...
doi:10.1016/j.ifacol.2018.08.046
fatcat:lf364ahqxbbhzldzrrx7uatyu4
Regret Bounds for Safe Gaussian Process Bandit Optimization
[article]
2020
arXiv
pre-print
The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. ...
In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. ...
Safe exploration in finite markov decision processes with gaussian processes. In Advances in Neural Information Processing Systems, pages 4312-4320. ...
arXiv:2005.01936v1
fatcat:omsi6phcwzfg7jo5345lddvkja
Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes
2020
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions ...
More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only ...
Gaussian Process A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution [2] . ...
doi:10.1109/iros45743.2020.9341589
fatcat:ejpw7yp6qnbfdnlis7l2mgkpni
The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes
[article]
2017
arXiv
pre-print
For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity ...
One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. ...
ACKNOWLEDGEMENTS The authors would like to thank Alexander Liniger from the Automatic Control Laboratory in ETH Zürich for his help in testing the algorithm on the ORCA platform ...
arXiv:1411.5925v4
fatcat:jezsixb7jrcajfvgujo42eumbu
FAUST^2: Formal Abstractions of Uncountable-STate STochastic processes
[article]
2014
arXiv
pre-print
A dtMP model is specified in MATLAB and abstracted as a finite-state Markov chain or Markov decision processes. ...
The abstract model is formally put in relationship with the concrete dtMP via a user-defined maximum threshold on the approximation error introduced by the abstraction procedure. ...
The dtMP S is abstracted as a Markov decision process (MDP) P = (P, U p , T p ), where now the finite input space is U p = {u 1 , u 2 , . . . , u q }, and T p (u, z, z ) = T s (Ξ(z )|z, u) for all z, z ...
arXiv:1403.3286v1
fatcat:ycgvg5fzu5cxzm7f6vz4crktxa
Probabilistic Model Checking of Labelled Markov Processes via Finite Approximate Bisimulations
[chapter]
2014
Lecture Notes in Computer Science
This paper concerns labelled Markov processes (LMPs), probabilistic models over uncountable state spaces originally introduced by Prakash Panangaden and colleagues. ...
labelled Markov chain (LMC). ...
The authors are supported in part by the ERC Advanced Grant VERIWARE, the EU FP7 project HIERATIC, the EU FP7 project MoVeS, the EU FP7 Marie Curie grant MANTRAS, the EU FP7 project AMBI, and by the NWO ...
doi:10.1007/978-3-319-06880-0_2
fatcat:4q4rqu3rszdi3ae7s2fk2vyua4
Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction
[article]
2021
arXiv
pre-print
In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. ...
the Gaussian policy for exploration. ...
In this section, we also show one concrete example of safe RL algorithms with our exploration process adjustment method. In addition, we compare our study with some related work for safe RL. ...
arXiv:2103.03656v1
fatcat:a7olmyxkxrdk3kcswjoqcgn534
Safe Reinforcement Learning in Constrained Markov Decision Processes
[article]
2020
arXiv
pre-print
In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. ...
In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. ...
Representatives of such work are probably approximately correct Markov decision process (PAC-MDP) algorithms (Brafman & Tennenholtz, 2002; Kearns & Singh, 2002; Strehl et al., 2006) . ...
arXiv:2008.06626v1
fatcat:fpblmiihknckfn6fghde23mc2m
FAUST $^{\mathsf 2}$ : Formal Abstractions of Uncountable-STate STochastic Processes
[chapter]
2015
Lecture Notes in Computer Science
A dtMP model is specified in MATLAB and abstracted as a finite-state Markov chain or a Markov decision process. ...
The toolbox is available at http://sourceforge.net/projects/faust2/ Models: Discrete-Time Markov Processes We consider a discrete-time Markov process (dtMP) s(k), k ∈ N ∪ {0} defined over a general state ...
The dtMP S is abstracted as a Markov decision process (MDP) P = (P, U p , T p ), where now the finite input space is U p = {u 1 , u 2 , . . . , u q }, and T p (u, z, z ) = T s (Ξ(z )|z, u) for all z, z ...
doi:10.1007/978-3-662-46681-0_23
fatcat:qqltkzrnkvhdvkq6m7ld7ur6dq
The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes
2017
The Journal of Artificial Intelligence Research
For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity ...
One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. ...
ACKNOWLEDGEMENTS The authors would like to thank Alexander Liniger from the Automatic Control Laboratory in ETH Zürich for his help in testing the algorithm on the ORCA platform ...
doi:10.1613/jair.5500
fatcat:lgtdmkwserd3hd3b3goynic4u4
Steady-state distributions for human decisions in two-alternative choice tasks
2010
Proceedings of the 2010 American Control Conference
In this section we find conditions under which the decision making can be modeled as a Markov process. ...
MARKOV MODEL OF DECISION MAKING Consider the DDM decision maker faced with the twoalternative, forced-choice task. ...
doi:10.1109/acc.2010.5530563
fatcat:ovdplzecw5b6bi4qjsk75w3d4y
Safe Reinforcement Learning with Mixture Density Network: A Case Study in Autonomous Highway Driving
[article]
2020
arXiv
pre-print
This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions. ...
Our simulation results demonstrate that the proposed safety system outperforms previously reported results in terms of average reward and number of collisions. ...
We formalize the problem as a Markov decision process (MDP) where at each time-step t, the agent interacts with the environment, receives the state s t ∈ S, and performs an action a t ∈ A. ...
arXiv:2007.01698v3
fatcat:g6zxygzyenf7rjdlbr26x2fnbu
Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models
[article]
2019
arXiv
pre-print
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. ...
We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks. ...
The authors also gratefully acknowledge funding from Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration in support of ...
arXiv:1904.01068v1
fatcat:kdbirznosngcpdioji3odsideq
Bayesian Deep Reinforcement Learning via Deep Kernel Learning
2018
International Journal of Computational Intelligence Systems
., a Gaussian process with deep kernel) is adopted to learn the hidden complex action-value function instead of classical deep learning models, which could encode more uncertainty and fully take advantage ...
Reinforcement learning (RL) aims to resolve the sequential decision-making under uncertainty problem where an agent needs to interact with an unknown environment with the expectation of optimising the ...
the weights for interactions in replay memory; and the last one is to apply fuzzy systems techniques to the Q-learning process to deal with decision processes ? ...
doi:10.2991/ijcis.2018.25905189
fatcat:ewcbyl27u5gippxdezgf2yamni
« Previous
Showing results 1 — 15 out of 3,617 results