Filters








3,617 Hits in 5.7 sec

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause
2016 arXiv   pre-print
In this paper, we address the problem of safely exploring finite Markov decision processes (MDP).  ...  Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out.  ...  Specifically, we consider the problem of exploring a Markov decision process (MDP), where it is a priori unknown which state-action pairs are safe.  ... 
arXiv:1606.04753v2 fatcat:rbi4n4eruva6le7j42ubzob54q

Temporal Logic Control of POMDPs via Label-based Stochastic Simulation Relations

S. Haesaert, P. Nilsson, C.I. Vasile, R. Thakker, A. Agha-mohammadi, A.D. Ames, R.M. Murray
2018 IFAC-PapersOnLine  
The synthesis of controllers guaranteeing linear temporal logic specifications on partially observable Markov decision processes (POMDP) via their belief models causes computational issues due to the continuous  ...  In this work, we construct a finite-state abstraction on which a control policy is synthesized and refined back to the original belief model.  ...  For finite-state partially observable Markov decision processes (POMDPs), verification and policy synthesis has been considered for PCTL properties (Norman et al., 2017; Chatterjee et al., 2015) .  ... 
doi:10.1016/j.ifacol.2018.08.046 fatcat:lf364ahqxbbhzldzrrx7uatyu4

Regret Bounds for Safe Gaussian Process Bandit Optimization [article]

Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
2020 arXiv   pre-print
The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule.  ...  In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010].  ...  Safe exploration in finite markov decision processes with gaussian processes. In Advances in Neural Information Processing Systems, pages 4312-4320.  ... 
arXiv:2005.01936v1 fatcat:omsi6phcwzfg7jo5345lddvkja

Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes

Matthew Budd, Bruno Lacerda, Paul Duckworth, Andrew West, Barry Lennox, Nick Hawes
2020 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions  ...  More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only  ...  Gaussian Process A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution [2] .  ... 
doi:10.1109/iros45743.2020.9341589 fatcat:ejpw7yp6qnbfdnlis7l2mgkpni

The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes [article]

Nikolaos Kariotoglou, Maryam Kamgarpour, Tyler Summers, John Lygeros
2017 arXiv   pre-print
For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity  ...  One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications.  ...  ACKNOWLEDGEMENTS The authors would like to thank Alexander Liniger from the Automatic Control Laboratory in ETH Zürich for his help in testing the algorithm on the ORCA platform  ... 
arXiv:1411.5925v4 fatcat:jezsixb7jrcajfvgujo42eumbu

FAUST^2: Formal Abstractions of Uncountable-STate STochastic processes [article]

S. Esmaeil Zadeh Soudjani, C. Gevaerts, A. Abate
2014 arXiv   pre-print
A dtMP model is specified in MATLAB and abstracted as a finite-state Markov chain or Markov decision processes.  ...  The abstract model is formally put in relationship with the concrete dtMP via a user-defined maximum threshold on the approximation error introduced by the abstraction procedure.  ...  The dtMP S is abstracted as a Markov decision process (MDP) P = (P, U p , T p ), where now the finite input space is U p = {u 1 , u 2 , . . . , u q }, and T p (u, z, z ) = T s (Ξ(z )|z, u) for all z, z  ... 
arXiv:1403.3286v1 fatcat:ycgvg5fzu5cxzm7f6vz4crktxa

Probabilistic Model Checking of Labelled Markov Processes via Finite Approximate Bisimulations [chapter]

Alessandro Abate, Marta Kwiatkowska, Gethin Norman, David Parker
2014 Lecture Notes in Computer Science  
This paper concerns labelled Markov processes (LMPs), probabilistic models over uncountable state spaces originally introduced by Prakash Panangaden and colleagues.  ...  labelled Markov chain (LMC).  ...  The authors are supported in part by the ERC Advanced Grant VERIWARE, the EU FP7 project HIERATIC, the EU FP7 project MoVeS, the EU FP7 Marie Curie grant MANTRAS, the EU FP7 project AMBI, and by the NWO  ... 
doi:10.1007/978-3-319-06880-0_2 fatcat:4q4rqu3rszdi3ae7s2fk2vyua4

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction [article]

Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane
2021 arXiv   pre-print
In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object.  ...  the Gaussian policy for exploration.  ...  In this section, we also show one concrete example of safe RL algorithms with our exploration process adjustment method. In addition, we compare our study with some related work for safe RL.  ... 
arXiv:2103.03656v1 fatcat:a7olmyxkxrdk3kcswjoqcgn534

Safe Reinforcement Learning in Constrained Markov Decision Processes [article]

Akifumi Wachi, Yanan Sui
2020 arXiv   pre-print
In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints.  ...  In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region.  ...  Representatives of such work are probably approximately correct Markov decision process (PAC-MDP) algorithms (Brafman & Tennenholtz, 2002; Kearns & Singh, 2002; Strehl et al., 2006) .  ... 
arXiv:2008.06626v1 fatcat:fpblmiihknckfn6fghde23mc2m

FAUST $^{\mathsf 2}$ : Formal Abstractions of Uncountable-STate STochastic Processes [chapter]

Sadegh Esmaeil Zadeh Soudjani, Caspar Gevaerts, Alessandro Abate
2015 Lecture Notes in Computer Science  
A dtMP model is specified in MATLAB and abstracted as a finite-state Markov chain or a Markov decision process.  ...  The toolbox is available at http://sourceforge.net/projects/faust2/ Models: Discrete-Time Markov Processes We consider a discrete-time Markov process (dtMP) s(k), k ∈ N ∪ {0} defined over a general state  ...  The dtMP S is abstracted as a Markov decision process (MDP) P = (P, U p , T p ), where now the finite input space is U p = {u 1 , u 2 , . . . , u q }, and T p (u, z, z ) = T s (Ξ(z )|z, u) for all z, z  ... 
doi:10.1007/978-3-662-46681-0_23 fatcat:qqltkzrnkvhdvkq6m7ld7ur6dq

The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes

Nikolaos Kariotoglou, Maryam Kamgarpour, Tyler H. Summers, John Lygeros
2017 The Journal of Artificial Intelligence Research  
For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity  ...  One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications.  ...  ACKNOWLEDGEMENTS The authors would like to thank Alexander Liniger from the Automatic Control Laboratory in ETH Zürich for his help in testing the algorithm on the ORCA platform  ... 
doi:10.1613/jair.5500 fatcat:lgtdmkwserd3hd3b3goynic4u4

Steady-state distributions for human decisions in two-alternative choice tasks

Andrew Stewart, Ming Cao, Naomi Ehrich Leonard
2010 Proceedings of the 2010 American Control Conference  
In this section we find conditions under which the decision making can be modeled as a Markov process.  ...  MARKOV MODEL OF DECISION MAKING Consider the DDM decision maker faced with the twoalternative, forced-choice task.  ... 
doi:10.1109/acc.2010.5530563 fatcat:ovdplzecw5b6bi4qjsk75w3d4y

Safe Reinforcement Learning with Mixture Density Network: A Case Study in Autonomous Highway Driving [article]

Ali Baheri
2020 arXiv   pre-print
This paper presents a safe reinforcement learning system for automated driving that benefits from multimodal future trajectory predictions.  ...  Our simulation results demonstrate that the proposed safety system outperforms previously reported results in terms of average reward and number of collisions.  ...  We formalize the problem as a Markov decision process (MDP) where at each time-step t, the agent interacts with the environment, receives the state s t ∈ S, and performs an action a t ∈ A.  ... 
arXiv:2007.01698v3 fatcat:g6zxygzyenf7rjdlbr26x2fnbu

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models [article]

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh
2019 arXiv   pre-print
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.  ...  We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks.  ...  The authors also gratefully acknowledge funding from Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration in support of  ... 
arXiv:1904.01068v1 fatcat:kdbirznosngcpdioji3odsideq

Bayesian Deep Reinforcement Learning via Deep Kernel Learning

Junyu Xuan, Jie Lu, Zheng Yan, Guangquan Zhang
2018 International Journal of Computational Intelligence Systems  
., a Gaussian process with deep kernel) is adopted to learn the hidden complex action-value function instead of classical deep learning models, which could encode more uncertainty and fully take advantage  ...  Reinforcement learning (RL) aims to resolve the sequential decision-making under uncertainty problem where an agent needs to interact with an unknown environment with the expectation of optimising the  ...  the weights for interactions in replay memory; and the last one is to apply fuzzy systems techniques to the Q-learning process to deal with decision processes ?  ... 
doi:10.2991/ijcis.2018.25905189 fatcat:ewcbyl27u5gippxdezgf2yamni
« Previous Showing results 1 — 15 out of 3,617 results