The strategic student approach for life-long exploration and learning

Manuel Lopes, Pierre-Yves Oudeyer
2012 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)  
This article introduces the strategic student metaphor: a student has to learn a number of topics (or tasks) to maximize its mean score, and has to choose strategically how to allocate its time among the topics and/or which learning method to use for a given topic. We show that under which conditions a strategy where time allocation or learning method is chosen from the easier to the more complex topic is optimal. Then, we show an algorithm, based on multi-armed bandit techniques, that allows
more » ... pirical online evaluation of learning progress and approximates the optimal solution under more general conditions. Finally, we show that the strategic student problem formulation allows to view in a common framework many previous approaches to active and developmental learning. 1 Formal study of active learning in general is also quite recent [15], [16] . The main problem is that most theory on learning relies on the assumptions that the learning data is acquired randomly, i.e. with the same distribution as the future encounters, and in active learning the agent itself chooses which data to sample. Recent development from machine learning, mainly from active learning and multi-armed bandits, started to contribute to a formal view on the complexity of learning agents that choose their own samples: optimal experimental design and active learning [8], [15], [26]-[28], n-armed bandits [29] and the general exploration-exploitation dilemma in RL [18]
doi:10.1109/devlrn.2012.6400807 dblp:conf/icdl-epirob/LopesO12 fatcat:tftcf42uqnhnfc4w6oh2jm33iq