Continual learning with hypernetworks

Johannes Von Oswald, Christian Henning, João Sacramento, Benjamin F Grewe
2020
Artificial neural networks suffer from catastrophic forgetting when they are se-quentially trained on multiple tasks. To overcome this problem, we present a novelapproach based on task-conditioned hypernetworks, i.e., networks that generatethe weights of a target model based on task identity. Continual learning (CL) isless difficult for this class of models thanks to a simple key feature: instead ofrecalling the input-output relations of all previously seen data, task-conditionedhypernetworks
more » ... ly require rehearsing task-specific weight realizations, which canbe maintained in memory using a simple regularizer. Besides achieving state-of-the-art performance on standard CL benchmarks, additional experiments on longtask sequences reveal that task-conditioned hypernetworks display a very largecapacity to retain previous memories. Notably, such long memory lifetimes areachieved in a compressive regime, when the number of trainable hypernetworkweights is comparable or smaller than target network size. We provide insight intothe structure of low-dimensional task embedding spaces (the input space of thehypernetwork) and show that task-conditioned hypernetworks demonstrate transferlearning. Finally, forward information transfer is further supported by empiricalresults on a challenging CL benchmark based on the CIFAR-10/100 image datasets. ABSTRACT Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key feature: instead of recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing task-specific weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving state-ofthe-art performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display a very large capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable hypernetwork weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets. 10 Published as a conference paper at ICLR 2020 Andrew K Lampinen and James L McClelland. Embedded meta-learning: Toward more flexible deep-learning models. arXiv preprint arXiv:1905.09950, 2019. Moshe Leshno and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.
doi:10.5167/uzh-200390 fatcat:nchjgzvcs5hxfgo6do7bxrblmq