Risk Aversion In Learning Algorithms and an Application To Recommendation Systems [article]

Andreas Haupt, Aroon Narayanan
2022 arXiv   pre-print
Consider a bandit learning environment. We demonstrate that popular learning algorithms such as Upper Confidence Band (UCB) and ε-Greedy exhibit risk aversion: when presented with two arms of the same expectation, but different variance, the algorithms tend to not choose the riskier, i.e. higher variance, arm. We prove that ε-Greedy chooses the risky arm with probability tending to 0 when faced with a deterministic and a Rademacher-distributed arm. We show experimentally that UCB also shows
more » ... -averse behavior, and that risk aversion is present persistently in early rounds of learning even if the riskier arm has a slightly higher expectation. We calibrate our model to a recommendation system and show that algorithmic risk aversion can decrease consumer surplus and increase homogeneity. We discuss several extensions to other bandit algorithms, reinforcement learning, and investigate the impacts of algorithmic risk aversion for decision theory.
arXiv:2205.04619v1 fatcat:x4k2x5lczbgndb2zkhjqrb6ewi