AI Alignment and Human Reward

Patrick Butlin
2021 Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society  
According to a prominent approach to AI alignment, AI agents should be built to learn and promote human values. However, humans value things in several different ways: we have desires and preferences of various kinds, and if we engage in reinforcement learning, we also have reward functions. One research project to which this approach gives rise is therefore to say which of these various classes of human values should be promoted. This paper takes on part of this project by assessing the
more » ... l that human reward functions should be the target for AI alignment. There is some reason to believe that powerful AI agents which were aligned to values of this form would help us to lead good lives, but there is also considerable uncertainty about this claim, arising from unresolved empirical and conceptual issues in human psychology.
doi:10.1145/3461702.3462570 fatcat:pz2ss3cvc5cu3mg2bq4rlqojby