Filters








30,332 Hits in 2.7 sec

On the Expressivity of Markov Reward [article]

David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh
2021 arXiv   pre-print
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform.  ...  Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture.  ...  Acknowledgments and Disclosure of Funding The authors would like to thank André Barreto, Diana Borsa, Michael Bowling, Wilka Carvalho, Brian Christian, Jess Hamrick, Steven Hansen, Zac Kenton, Ramana Kumar  ... 
arXiv:2111.00876v1 fatcat:w47rg774jrgnfh4imgzzexdujm

Exact distributions for reward functions on semi-Markov and Markov additive processes

Valeri T. Stefanov
2006 Journal of Applied Probability  
The present paper provides neat general results which lead to explicit closed-form expressions for the relevant Laplace transforms of general reward functions on semi-Markov and Markov additive processes  ...  The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed.  ...  The asymptotic theory for reward functions on Markov processes is very well developed and is therefore not a subject of this paper. The exact distributions of reward functions are of interest here.  ... 
doi:10.1017/s0021900200002424 fatcat:yj7qoxq4kvb2hov54bczbcsmsm

Exact distributions for reward functions on semi-Markov and Markov additive processes

Valeri T. Stefanov
2006 Journal of Applied Probability  
The present paper provides neat general results which lead to explicit closed-form expressions for the relevant Laplace transforms of general reward functions on semi-Markov and Markov additive processes  ...  The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed.  ...  The asymptotic theory for reward functions on Markov processes is very well developed and is therefore not a subject of this paper. The exact distributions of reward functions are of interest here.  ... 
doi:10.1239/jap/1165505207 fatcat:upi5z5gheraozbqflmqe2zl7gi

A Markov reward model checker

J.-P. Katoen, M. Khattri, I.S. Zapreevt
2005 Second International Conference on the Quantitative Evaluation of Systems (QEST'05)  
It supports reward extensions of PCTL and CSL, and allows for the automated verification of properties concerning long-run and instantaneous rewards as well as cumulative rewards.  ...  In particular, it supports to check the reachability of a set of goal states (by only visiting legal states before) under a time and an accumulated reward constraint.  ...  The tool presented in this paper supports the verification of Markov reward models, in particular DTMCs and CTMCs equipped with rewards.  ... 
doi:10.1109/qest.2005.2 dblp:conf/qest/KatoenKZ05 fatcat:wrdlylvj65fbbbp2u3hlgrsruq

Modelling Participation in Small Group Social Sequences with Markov Rewards Analysis

Gabriel Murray
2017 Proceedings of the Second Workshop on NLP and Computational Social Science  
Using a Markov Rewards framework, we associate particular states with immediate positive and negative rewards, and employ a Value Iteration algorithm to calculate the expected value of all states.  ...  In our findings, we focus on discourse states belonging to team leaders and project managers which are either very likely or very unlikely to lead to participation from the rest of the group members.  ...  The Rewards aspect of the Markov Rewards model is that certain states are associated with immediate rewards.  ... 
doi:10.18653/v1/w17-2910 dblp:conf/acl-nlpcss/Murray17 fatcat:3y64t3572bao7hahzfdlm5yuie

PRISM

Marta Kwiatkowska, Gethin Norman, David Parker
2009 Performance Evaluation Review  
In this paper, we give an overview of the probabilistic model checking tool PRISM, focusing in particular on its support for continuous-time Markov chains and Markov reward models, and how these can be  ...  Probabilistic model checking is a formal verification technique for the modelling and analysis of stochastic systems.  ...  The authors are supported in part by the EPSRC grants EP/D07956X and EP/D076625.  ... 
doi:10.1145/1530873.1530882 fatcat:u6miqjhpa5fpjpktd3wk2b2vya

Performance evaluation with temporal rewards

Jeroen P.M. Voeten
2002 Performance evaluation (Print)  
Basically, they only support the analysis of relatively simple performance metrics that can be expressed as long-run averages of atomic rewards, i.e. rewards that are deductible directly from the individual  ...  states of the initial Markov chain specification.  ...  I thank Gjalt de Jong and Alex Niemegeers for giving me the opportunity to study industrial-sized systems within Alcatel Antwerp and for the interesting discussions on system design and performance modelling  ... 
doi:10.1016/s0166-5316(02)00105-0 fatcat:apr5pin5xffkhmnkq4nam6ekpq

Interpreting Models of Social Group Interactions in Meetings with Probabilistic Model Checking

Oana Andrei, Gabriel Murray
2018 Proceedings of the Group Interaction Frontiers in Technology on ZZZ - GIFT'18  
Previous work showed how Markov rewards models can be used to analyse group interaction in meeting.  ...  For this study, we analyse a dataset taken from a standard corpus of scenario and non-scenario meetings and demonstrate the expressiveness of our approach to validate expected interactions and identify  ...  Oana Andrei's research work has been supported by the EPSRC UK Programme Grant Science of Sensor Systems Software (EP/N007565) and the University of Glasgow John Robertson Bequest Fund.  ... 
doi:10.1145/3279981.3279988 dblp:conf/icmi/AndreiM18 fatcat:r3ooeywpyzbgxkx266d4qkkzza

Discrete Time Non-Homogeneous Semi-Markov Processes Applied to Models for Disability Insurance

Guglielmo D'Amico, Montserrat Guillén, Raimondo Manca
2012 Social Science Research Network  
The use of semi-Markov reward processes facilitates the possibility of deriving equations of the prospective and retrospective mathematical reserves.  ...  The model is based on a discrete time non-homogeneous semi-Markov process (DTNHSMP) to which the backward recurrence time process is introduced.  ...  Acknowledgment The comments and helpful suggestions made by two anonymous referees are gratefully acknowledged.  ... 
doi:10.2139/ssrn.2030350 fatcat:3cwqct34jnaltoitzfh73xyrx4

Page 820 of Mathematical Reviews Vol. , Issue 2000a [page]

2000 Mathematical Reviews  
be expressed as the sum of the immediate expected reward and the discounted expected reward over the remaining 90 OPERATIONS RESEARCH, MATHEMATICAL PROGRAMMING 820 time.  ...  One of the algorithms updates the policy whenever the Markov chain reaches a particular state.  ... 

Greedy confidence bound techniques for restless multi-armed bandit based Cognitive Radio

Shuyan Dong, Jungwoo Lee
2013 2013 47th Annual Conference on Information Sciences and Systems (CISS)  
If the primary user occupancy on each channel is modeled as an identical but independent Markov chain with unknown parameters, we obtain a non-Bayesian RMAB.  ...  The cognitive radio, built on a software-defined radio, is defined as an intelligent wireless communication system that is aware of its environment and uses the methodology of understanding-by-building  ...  The objective is to maximize the sum of the collected rewards. The bandit problem is formally equivalent to a one-state Markov decision process.  ... 
doi:10.1109/ciss.2013.6552267 dblp:conf/ciss/DongL13 fatcat:s3vrcqx7pzgvtp5sz2xytifmnm

Page 4575 of Mathematical Reviews Vol. , Issue 89H [page]

1989 Mathematical Reviews  
Summary: “Consider a perturbation in the one-step transition probabilities and rewards of a discrete-time Markov reward pro- cess with an unbounded one-step reward function.  ...  The authors study discrete-time Markov reward processes with perturbations in the one-step transition probabilities and rewards.  ... 

Reward distributions associated with some block tridiagonal transition matrices with applications to identity by descent

Valeri T. Stefanov, Frank Ball
2009 Advances in Applied Probability  
A method is provided for calculating explicit, closed-form expressions for Laplace transforms of general reward functions for such Markov chains.  ...  Such statistics are the amount of genome shared IBD by the two related individuals on a chromosomal segment and the number of IBD pieces on such a segment.  ...  In order to keep the expressions neater and consistent with related expressions on reward functions for semi-Markov processes, we convert our twodimensional processes to one-dimesional processes.  ... 
doi:10.1239/aap/1246886622 fatcat:x5nlfgtm7zhg5f4xabkgzi3pcq

Reward distributions associated with some block tridiagonal transition matrices with applications to identity by descent

Valeri T. Stefanov, Frank Ball
2009 Advances in Applied Probability  
A method is provided for calculating explicit, closed-form expressions for Laplace transforms of general reward functions for such Markov chains.  ...  Such statistics are the amount of genome shared IBD by the two related individuals on a chromosomal segment and the number of IBD pieces on such a segment.  ...  In order to keep the expressions neater and consistent with related expressions on reward functions for semi-Markov processes, we convert our twodimensional processes to one-dimesional processes.  ... 
doi:10.1017/s0001867800003402 fatcat:gj446rvombfvzfjrajaalgmnnu

Reversible Markov Decision Processes with an Average-Reward Criterion

Randy Cogill, Cheng Peng
2013 SIAM Journal of Control and Optimization  
The analysis of reversible Markov chains is often significantly simpler than analysis of general Markov chains, particularly since there are often simple closed-form expressions for their invariant probability  ...  In this paper we study the structure of optimal control policies for Markov decision processes with reversible dynamics.  ...  The analysis of reversible Markov chains is often significantly simpler than analysis of general Markov chains, particularly since there are often simple closed-form expressions for their invariant distributions  ... 
doi:10.1137/110844957 fatcat:v67ii46zsnedrjfbtxefxwjdbi
« Previous Showing results 1 — 15 out of 30,332 results