3,590 Hits in 4.8 sec

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models [article]

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh
2019 arXiv   pre-print
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.  ...  We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks.  ...  The authors also gratefully acknowledge funding from Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration in support of  ... 
arXiv:1904.01068v1 fatcat:kdbirznosngcpdioji3odsideq

Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes

Matthew Budd, Bruno Lacerda, Paul Duckworth, Andrew West, Barry Lennox, Nick Hawes
2020 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions  ...  More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only  ...  ACKNOWLEDGMENTS This work has been funded by UK Research and Innovation and EPSRC through the Robotics and Artificial Intelligence for Nuclear (RAIN), and Offshore Robotics for Certification of Assets  ... 
doi:10.1109/iros45743.2020.9341589 fatcat:ejpw7yp6qnbfdnlis7l2mgkpni

Safety-Constrained Reinforcement Learning for MDPs [article]

Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, Joost-Pieter Katoen
2015 arXiv   pre-print
Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.  ...  We consider controller synthesis for stochastic and partially unknown environments in which safety is essential.  ...  (MDP) A Markov decision process (MDP) M = (S, s I , Act, P) is a tuple with a finite set S of states, a unique initial state s I ∈ S, a finite set Act of actions, and a (partial) probabilistic transition  ... 
arXiv:1510.05880v1 fatcat:bqjtjzv7kngkxm2z4c7zgxzji4

Safe Exploration in Markov Decision Processes [article]

Teodor Mihai Moldovan, Pieter Abbeel
2012 arXiv   pre-print
In this paper we address the need for safe exploration methods in Markov decision processes. We first propose a general formulation of safety through ergodicity.  ...  In environments with uncertain dynamics exploration is necessary to learn how to perform well.  ...  Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-11-1-0391.  ... 
arXiv:1205.4810v3 fatcat:nyyddzytg5hkhombmdpggh6u44

Safety-Critical Learning of Robot Control with Temporal Logic Specifications [article]

Mingyu Cai, Cristian-Ioan Vasile
2022 arXiv   pre-print
However, success is limited to real-world applications, because ensuring safe exploration and facilitating adequate exploitation is a challenge for controlling robotic systems with unknown models and measurement  ...  of learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safe exploration during the learning process using Exponential  ...  Markov decision processes (MDP) are often employed to model the dynamics robots and interaction with environments.  ... 
arXiv:2109.02791v6 fatcat:foaehouwtzfndgl57n44rdbafi

Reliability modeling of life-critical, real-time systems

L. Tomek, V. Mainkar, R.M. Geist, K.S. Trivedi
1994 Proceedings of the IEEE  
In this paper, we discuss the role of modeling in the design and validation of life-critical, real-time systems. The basics of Markov, Markov reward, and stochastic reward net models are covered.  ...  Multilevel models, model calibration, and model validation are also discussed. reliable to effectively preclude the loss of human life carries with it an important obstruction to that very design: the  ...  Models allow designers to interactively estimate the effects of major design decisions and explore the sensitivity of model outputs, such as estimated reliability, to changes or inaccuracies in model inputs  ... 
doi:10.1109/5.259430 fatcat:jlatdxmao5gipffqswkjwftyzq

Safe Exploration for Interactive Machine Learning [article]

Matteo Turchetta, Felix Berkenkamp, Andreas Krause
2019 arXiv   pre-print
We apply our framework to safe Bayesian optimization and to safe exploration in deterministic Markov Decision Processes (MDP), which have been analyzed separately before.  ...  Our method works as an add-on that takes suggested decisions as input and exploits regularity assumptions in terms of a Gaussian process prior in order to efficiently learn about their safety.  ...  Safe shortest path in deterministic MDPs The graph that we introduced in Sec. 2 can model states (nodes) and state transitions (edges) in deterministic, discrete MDPs.  ... 
arXiv:1910.13726v1 fatcat:xw6z3cf5t5cyfpcqfyluubud4a

Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving [article]

Maxime Bouton, Jesper Karlsson, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer, Jana Tumova
2019 arXiv   pre-print
The resulting policy outperforms a rule-based heuristic approach in terms of efficiency while exhibiting strong guarantees on safety.  ...  An exploration strategy is derived prior to training that constrains the agent to choose among actions that satisfy a desired probabilistic specification expressed with linear temporal logic (LTL).  ...  Acknowledgement This work has been supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) and the Swedish Research Council (VR).  ... 
arXiv:1904.07189v2 fatcat:fxsjjitdeven3a56jn6oxxohbm

Dynamical systems as a level of cognitive analysis of multi-agent learning

Wolfram Barfuss
2021 Neural computing & applications (Print)  
I find that its deterministic dynamical systems description follows a minimum free-energy principle and unifies a boundedly rational account of game theory with decision-making under uncertainty.  ...  I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning.  ...  Acknowledgements This work was supported by UK Research and Innovation Future Leaders Fellowship MR/S032525/1. It is based on a previously published extended abstract [5] . I thank Richard P.  ... 
doi:10.1007/s00521-021-06117-0 pmid:35221541 pmcid:PMC8827307 fatcat:y3oxx2kglvfpjd5onhriky5g44

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause
2016 arXiv   pre-print
In this paper, we address the problem of safely exploring finite Markov decision processes (MDP).  ...  We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.  ...  This research was partially supported by the Max Planck ETH Center for Learning Systems and SNSF grant 200020_159557.  ... 
arXiv:1606.04753v2 fatcat:rbi4n4eruva6le7j42ubzob54q

PASS: Abstraction Refinement for Infinite Probabilistic Models [chapter]

Ernst Moritz Hahn, Holger Hermanns, Björn Wachter, Lijun Zhang
2010 Lecture Notes in Computer Science  
We present PASS, a tool that analyzes concurrent probabilistic programs, which map to potentially infinite Markov decision processes.  ...  PASS is based on predicate abstraction and abstraction refinement and scales to programs far beyond the reach of numerical methods which operate on the full state space of the model.  ...  To account for both randomness and concurrency, Markov decision processes (MDPs) are used as a semantic foundation as they feature both non-deterministic and probabilistic choice.  ... 
doi:10.1007/978-3-642-12002-2_30 fatcat:i6lthhwblzchtgls3oqorsabqu

Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art [article]

Youngmin Kim, Richard Allmendinger, Manuel López-Ibáñez
2021 arXiv   pre-print
Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points, which are solutions, policies, or strategies that  ...  Although a comprehensive survey of safe reinforcement learning algorithms was published in 2015, a number of new algorithms have been proposed thereafter, and related works in active learning and in optimization  ...  SEOFUR [8] is an algorithm for safe exploration of deterministic MDPs with unknown transition functions.  ... 
arXiv:2101.09505v2 fatcat:4umo7ufk5vcqtgieeix5lf72wa

Cautious Reinforcement Learning with Logical Constraints [article]

Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening
2020 arXiv   pre-print
in exploration (towards goal satisfaction) and ensuring safety.  ...  Enforcing the RL agent to stay safe during learning might limit the exploration, however we show that the proposed architecture is able to automatically handle the trade-off between efficient progress  ...  This work is in part supported by the HiClass project (113213), a partnership between the Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate  ... 
arXiv:2002.12156v2 fatcat:adhiibs6qfagjn4rpvyypwupxa

Observation Planning for Object Search by a Mobile Robot with Uncertain Recognition [chapter]

Matthieu Boussard, Jun Miura
2013 Advances in Intelligent Systems and Computing  
In order to handle complex tasks in an unknown environment, a robot has to build a map with both free space information and objects type and location.  ...  Having multiple candidates and uncertain algorithm outcomes, we cast the problem as a Markov Decision Process.  ...  Processes Markov Decision Process [9] formalizes a sequential decision problem under uncertainty.  ... 
doi:10.1007/978-3-642-33932-5_10 fatcat:yu4qqzpytjbmfkgaodey2jeo24

Omega-Regular Objectives in Model-Free Reinforcement Learning [chapter]

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
2019 Lecture Notes in Computer Science  
We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs).  ...  We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almost-sure reachability problem, and extend this technique to learning how to control an unknown model  ...  In Sect. 4 we prove the main results. Finally, Sect. 5 discusses our experiments. Preliminaries Markov Decision Processes Let D(S) be the set of distributions over S.  ... 
doi:10.1007/978-3-030-17462-0_27 fatcat:rfxrmhlb3ne7tbfbcch64ucz34
« Previous Showing results 1 — 15 out of 3,590 results