A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Risk Aversion In Learning Algorithms and an Application To Recommendation Systems
[article]
2022
arXiv
pre-print
Consider a bandit learning environment. We demonstrate that popular learning algorithms such as Upper Confidence Band (UCB) and ε-Greedy exhibit risk aversion: when presented with two arms of the same expectation, but different variance, the algorithms tend to not choose the riskier, i.e. higher variance, arm. We prove that ε-Greedy chooses the risky arm with probability tending to 0 when faced with a deterministic and a Rademacher-distributed arm. We show experimentally that UCB also shows
arXiv:2205.04619v1
fatcat:x4k2x5lczbgndb2zkhjqrb6ewi