Dynamics-aware novelty search with behavior repulsion

Kang Xu, Yan Ma, Wei Li
2022 Proceedings of the Genetic and Evolutionary Computation Conference  
Searching solutions for the task with sparse or deceptive rewards is a fundamental problem in Evolutionary Algorithms (EA) and Reinforcement Learning (RL). Existing methods in RL have been proposed to enhance the exploration by encouraging agents to obtain novel states. However, solely seeking a single local optimal solution could be insufficient for the tasks with the deceptive local optima. Novelty-Search (NS) and Quality-Diversity (QD) have shown promising results for finding diverse
more » ... s with different behavioral characteristics. However, manually defining the task-specific behavior description limits these methods to lowdimensional tasks. This paper presents Dynamics-aware Novelty Search with Behavior Repulsion (DANSBR), a hybrid algorithm that evolves high-performing solutions by introducing a generalized novelty measurement and a bidirectional gradient-based mutation operator based on the Quality-Diversity paradigm. The novelty of a single solution is defined as the prediction error of an approximate dynamic model in the task-agnostic behavior space. The mutation operator drives the solution to behave differently or obtain better performance in a sample-efficient manner. As a result of better exploration, our approach outperforms several baselines on high-dimensional continuous control tasks with sparse rewards. Empirical results also demonstrate that DANSBR improves the performance on the task with deceptive rewards. CCS CONCEPTS • Computing methodologies → Search methodologies.
doi:10.1145/3512290.3528761 fatcat:ocwbnxzqana4xazfctsr6gezbm