A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning from the memory of Atari 2600
[article]
2016
arXiv
pre-print
We train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all
arXiv:1605.01335v1
fatcat:l7eyhl3hsvg3ldkbyb455g25fe
more »
... onsidered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.
Sequential Changepoint Detection in Neural Networks with Checkpoints
[article]
2020
arXiv
pre-print
We introduce a framework for online changepoint detection and simultaneous model learning which is applicable to highly parametrized models, such as deep neural networks. It is based on detecting changepoints across time by sequentially performing generalized likelihood ratio tests that require only evaluations of simple prediction score functions. This procedure makes use of checkpoints, consisting of early versions of the actual model parameters, that allow to detect distributional changes by
arXiv:2010.03053v1
fatcat:wdzbjqnysfeqjm7dkspvducbjy
more »
... performing predictions on future data. We define an algorithm that bounds the Type I error in the sequential testing procedure. We demonstrate the efficiency of our method in challenging continual learning applications with unknown task changepoints, and show improved performance compared to online Bayesian changepoint detection.
Importance Weighted Policy Learning and Adaptation
[article]
2021
arXiv
pre-print
The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the meta reinforcement learning literature much recent work has focused on the problem of optimizing the learning process itself. In this paper we study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas
arXiv:2009.04875v2
fatcat:7376gfqvqfa4xfq3di35imcqgu
more »
... rom the probabilistic inference literature and combines robust off-policy learning with a behavior prior, or default behavior that constrains the space of solutions and serves as a bias for exploration; as well as a representation for the value function, both of which are easily learned from a number of training tasks in a multi-task scenario. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
Meta-Learning with Latent Embedding Optimization
[article]
2019
arXiv
pre-print
Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space.
arXiv:1807.05960v3
fatcat:mzhtivi6s5cjjjifiznmvkac5y
more »
... resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.
An Empirical Study of Implicit Regularization in Deep Offline RL
[article]
2022
arXiv
pre-print
Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called effective rank, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the
arXiv:2207.02099v2
fatcat:23kvutruqvahzdoodbas7eyeiu
more »
... model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind lab. We observe that a direct association exists only in restricted settings and disappears in the more extensive hyperparameter sweeps. Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank. Further, we show that several other factors could confound the relationship between effective rank and performance and conclude that studying this association under simplistic assumptions could be highly misleading.
Open-Ended Learning Leads to Generally Capable Agents
[article]
2021
arXiv
pre-print
Jakub Sygnowski: Infrastructure development, agent analysis, and research investigations. Maja Trebacz: Research investigations. ...
arXiv:2107.12808v2
fatcat:wp5lbeezmrb6rdsbqf6etgurtq
Meta-learning by the baldwin effect
2018
Proceedings of the Genetic and Evolutionary Computation Conference Companion on - GECCO '18
The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by shaping the hyperparameters and the initial parameters of deep learning algorithms. Furthermore it can genetically accommodate strong
doi:10.1145/3205651.3205763
dblp:conf/gecco/FernandoSOWSTSP18
fatcat:gdtvorekirgsvhhs5acq3dufaq
more »
... g biases on the same set of problems as a recent machine learning algorithm called MAML "Model Agnostic Meta-Learning" which uses second-order gradients instead of evolution to learn a set of reference parameters (initial weights) that can allow rapid adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is more data efficient than the Baldwin effect, the Baldwin effect is more general in that it does not require gradients to be backpropagated to the reference parameters or hyperparameters, and permits effectively any number of gradient updates in the inner loop. The Baldwin effect learns strong learning dependent biases, rather than purely genetically accommodating fixed behaviours in a learning independent manner.
Theano: A Python framework for fast computation of mathematical expressions
[article]
2016
arXiv
pre-print
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many
arXiv:1605.02688v1
fatcat:2lcqwrk2zrbt5dyjmcofn6shhu
more »
... machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Dual Path Structural Contrastive Embeddings for Learning Novel Objects
[article]
2022
arXiv
pre-print
PMLR, 2020.
[50] Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan
Pascanu, Simon Osindero, and Raia Hadsell. ...
PMLR, 13–18 Jul 2020.
[49] Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan [69] Imtiaz Ziko, Jose Dolz, Eric Granger, and Ismail Ben Ayed. ...
arXiv:2112.12359v3
fatcat:x3utnehujrbwrirkap7qvu3v4i
Adaptive Cross-Modal Few-Shot Learning
[article]
2020
arXiv
pre-print
IJCV, 2015.
[41] Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon
Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. ...
In CVPR, 2005.
8
[2] Matthias Bauer, Mateo Rojas-Carulla, Jakub Bartlomiej Swikatkowski, Bernhard Scholkopf,
and Richard E Turner. ...
arXiv:1902.07104v3
fatcat:mmjrixwfkzg7fknqaxlgjggdki
CLTA: Contents and Length-based Temporal Attention for Few-shot Action Recognition
[article]
2021
arXiv
pre-print
In Proceedings of the [29] Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol
IEEE Conference on Computer Vision and Pattern Recog- Vinyals, Razvan Pascanu, Simon Osindero, and Raia ...
arXiv:2103.10567v1
fatcat:yl7extxpbbgxpodxgnmtnfma2q
Vector Quantized Models for Planning
[article]
2021
arXiv
pre-print
We'd also like to thank Sander Dielman, David Silver, Yoshua Bengio, Jakub Sygnowski, and Aravind Srinivas for useful discussions and suggestions. ...
arXiv:2106.04615v2
fatcat:kxuhklldsfcpretmxdgzafdxvu
Concept Learners for Few-Shot Learning
[article]
2021
arXiv
pre-print
Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero,
and Raia Hadsell. Meta-learning with latent embedding optimization. ...
arXiv:2007.07375v3
fatcat:4canqso2azhibnjootonpva6xa
Memory-Efficient Semi-Supervised Continual Learning: The World is its Own Replay Buffer
[article]
2021
arXiv
pre-print
Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A Rusu, Yee Whye Teh, and Razvan Pascanu.
Task agnostic continual learning via meta learning. arXiv preprint arXiv:1906.05201, 2019. ...
arXiv:2101.09536v2
fatcat:inyfzbclgfbw7mrej4cxh5q5ze
NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control
[article]
2021
arXiv
pre-print
Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018. ...
arXiv:2006.11992v3
fatcat:r4tgjunt7nhznbpncjocvvxmk4
« Previous
Showing results 1 — 15 out of 23 results