A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits
[article]
2017
bioRxiv
pre-print
Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). The problem of finding an efficient
doi:10.1101/117598
fatcat:qb6qicn46ffuhf7dp42kirhqd4