Partially observable markov decision processes for artificial intelligence [chapter]

Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra
1996 Lecture Notes in Computer Science
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable Markov decision processes POMDPs. We then outline a novel algorithm for solving POMDPs off line and show how, in many
more » ... a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a simple example. Consider the problem of a robot navigating in a complex office building. The robot can move from hallway intersection to intersection and can make locM observations of its world. Its actions are not completely reliable, however. Sometimes, when it intends to move, it stays where it is or goes to far; sometimes, when it intends to turn, it overshoots. It has similar problems with observation. Sometimes a corridor looks like a corner; sometimes a T-junction looks like an L-junction. How can such an error-plagued robot navigate, even given a map of the corridors? In general, the robot will have to remember something about its history of actions and observations and use this information, together with its knowledge of the underlying dynamics of the world (the map and other information), to maintain an estimate of its location. Many engineering applications follow this approach, using methods like the Kalman filter (Kalman 1960) to maintain a running estimate of the robot's spatial uncertainty, expressed as an ellipsoid in Cartesian space. This approach won't do for our robot, though. Its uncertainty may be discrete: it might be almost certain that it's in the north-east corner of either the fourth or the seventh floors, though it admits a chance that it's on the fifth floor, as well. Then, given an uncertain estimate of its location, the robot has to decide what actions to take. In some cases, it might be sufficient to ignore its uncertainty and take actions that would be appropriate for the most likely location. In other cases, it might be better for the robot to take actions for the purpose of gathering information, such as searching for a landmark or reading signs on the wall.