Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space [article]

Jorge Ramírez-Ruiz, Dmytro Grytskyy, Rubén Moreno-Bote
2022 arXiv   pre-print
Intrinsic motivation generates behaviors that do not necessarily lead to immediate reward, but help exploration and learning. Here we show that agents having the sole goal of maximizing occupancy of future actions and states, that is, moving and exploring on the long term, are capable of complex behavior without any reference to external rewards. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state
more » ... th occupancy. We provide analytical expressions that relate the optimal policy with the optimal state-value function, from where we prove uniqueness of the solution of the associated Bellman equation and convergence of our algorithm to the optimal state-value function. Using discrete and continuous state tasks, we show that 'dancing', hide-and-seek and a basic form of altruistic behavior naturally result from entropy seeking without external rewards. Intrinsically motivated agents can objectively determine what states constitute rewards, exploiting them to ultimately maximize action-state path entropy.
arXiv:2205.10316v1 fatcat:xixzn53fvrc2rhcae6dkulnupq