A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Autonomous adjustment of exploration in weakly supervised reinforcement learning
弱教示的強化学習における探索性の自律調整
2020
弱教示的強化学習における探索性の自律調整
Optimization in vast search spaces may be intractable, especially in reinforcement learning, and when the environment is real. On the other hand, humans seem to balance exploration and exploitation quite well in many tasks, and one reason is because they satisfice rather than optimize. That is to say, they stop exploring when a certain (aspiration) level is satisfied. Takahashi and others have introduced the risk-sensitive satisficing (RS) model that realizes efficient satisficing in the bandit
doi:10.11517/pjsai.jsai2020.0_4g2gs703
fatcat:vmygkgkxuffqffk3lqcrhjeekm