A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is `application/pdf`

.

##
###
k‐確実探査法と動的計画法を用いたMDPs環境の効率的探索法

An Efficient Exploration Method Using k-Certainty Exploration Method and Dynamic Programming under Markov Decision Processes

2001
*
Transactions of the Japanese society for artificial intelligence
*

An Efficient Exploration Method Using k-Certainty Exploration Method and Dynamic Programming under Markov Decision Processes

One most common problem in reinforcement learning systems (e.g. Q-learning) is to reduce the number of trials to converge to an optimal policy. As one of the solution to the problem, k-certainty exploration method was proposed. Miyazaki reported that this method could determine an optimal policy faster than Q-learning in Markov decision processes (MDPs). This method is very efficient learning method. But, we propose an improvement plan that makes this method more efficient. In k-certainty

doi:10.1527/tjsai.16.11
fatcat:axcg4qdvtvbknlxic2e6xssml4