11 Hits in 0.32 sec

ReSet: Learning Recurrent Dynamic Routing in ResNet-like Neural Networks [article]

Iurii Kemaev, Daniil Polykovskiy, Dmitry Vetrov
2018 arXiv   pre-print
Neural Network is a powerful Machine Learning tool that shows outstanding performance in Computer Vision, Natural Language Processing, and Artificial Intelligence. In particular, recently proposed ResNet architecture and its modifications produce state-of-the-art results in image classification problems. ResNet and most of the previously proposed architectures have a fixed structure and apply the same transformation to all input images. In this work, we develop a ResNet-based model that
more » ... lly selects Computational Units (CU) for each input object from a learned set of transformations. Dynamic selection allows the network to learn a sequence of useful transformations and apply only required units to predict the image label. We compare our model to ResNet-38 architecture and achieve better results than the original ResNet on CIFAR-10.1 test set. While examining the produced paths, we discovered that the network learned different routes for images from different classes and similar routes for similar images.
arXiv:1811.04380v1 fatcat:m4uxdj6dyrfldnrejjfcauc3gu

Podracer architectures for scalable Reinforcement Learning [article]

Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt
2021 arXiv   pre-print
Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems. Popular training pipelines that use
more » ... se frameworks for deep learning typically focus on (un-)supervised learning. How to best train reinforcement learning (RL) agents at scale is still an active research area. In this report we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way. Specifically we describe two architectures designed to make the best use of the resources available on a TPU Pod (a special configuration in a Google data center that features multiple TPU devices connected to each other by extremely low latency communication channels).
arXiv:2104.06272v1 fatcat:b2r6vt6w6rdc3j43ldxqipkzgi

Discovering a set of policies for the worst case reward [article]

Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh
2021 arXiv   pre-print
We study the problem of how to construct a set of policies that can be composed together to solve a collection of reinforcement learning tasks. Each task is a different reward function defined as a linear combination of known features. We consider a specific class of policy compositions which we call set improving policies (SIPs): given a set of policies and a set of tasks, a SIP is any composition of the former whose performance is at least as good as that of its constituents across all the
more » ... ks. We focus on the most conservative instantiation of SIPs, set-max policies (SMPs), so our analysis extends to any SIP. This includes known policy-composition operators like generalized policy improvement. Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks. The algorithm works by successively adding new policies to the set. We show that the worst-case performance of the resulting SMP strictly improves at each iteration, and the algorithm only stops when there does not exist a policy that leads to improved performance. We empirically evaluate our algorithm on a grid world and also on a set of domains from the DeepMind control suite. We confirm our theoretical results regarding the monotonically improving performance of our algorithm. Interestingly, we also show empirically that the sets of policies computed by the algorithm are diverse, leading to different trajectories in the grid world and very distinct locomotion skills in the control suite.
arXiv:2102.04323v2 fatcat:36aek7m7uffwdg7w5xoq4dpwaa

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon
Acknowledgements EN thanks Iurii Kemaev and Clement Gehring for invaluable help with JAX; Tristan Deleu, Gauthier Gidel, Amy Zhang, Aravind Rajeswaran, Ilya Kostrikov, Brandon Amos, and Aaron Courville  ... 
doi:10.1609/aaai.v36i7.20758 fatcat:wzlxxkpo25hkpisv2xotwkgfiy

Automap: Towards Ergonomic Automated Parallelism for ML Models [article]

Michael Schaarschmidt and Dominik Grewe and Dimitrios Vytiniotis and Adam Paszke and Georg Stefan Schmid and Tamara Norman and James Molloy and Jonathan Godwin and Norman Alexander Rink and Vinod Nair and Dan Belov
2021 arXiv   pre-print
Kemaev, Michael King, Lena Martens, Vladimir Mikulik, Tamara Norman, John Quan, George Papa- makarios, Roman Ring, Francisco Ruiz, Alvaro Sanchez, Rosalia Schneider, Eren Sezener, Stephen  ...  Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Claudio Fantacci, Jonathan Godwin, Chris Jones, Tom Hennigan, Matteo Hessel, Steven Kapturowski, Thomas Keck, Iurii  ... 
arXiv:2112.02958v1 fatcat:tlda37oxgjeezggohojvh4sdni

Meta-Gradient Reinforcement Learning with an Objective Discovered Online [article]

Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
2020 arXiv   pre-print
Acknowledgments and Disclosure of Funding The authors would like to thank Manuel Kroiss, Iurii Kemaev and developers of JAX, Haiku, RLax for their kind engineering support; and thank Joseph Modayil, Doina  ... 
arXiv:2007.08433v1 fatcat:ljl2ig64rffmphbluh24zpceoq

A Combinatorial Perspective on Transfer Learning [article]

Jianan Wang, Eren Sezener, David Budden, Marcus Hutter, Joel Veness
2020 arXiv   pre-print
[BHK+ 20] David Budden, Matteo Hessel, Iurii Kemaev, Stephen Spencer, and Fabio Viola. Chex: Testing made fun, in JAX!, 2020.  ... 
arXiv:2010.12268v1 fatcat:krjfaqo4wfgbnozehd2fwwni5u

Self-Consistent Models and Values [article]

Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
2021 arXiv   pre-print
Acknowledgments and Disclosure of Funding We would like to thank Ivo Danihelka, Junhyuk Oh, Iurii Kemaev, and Thomas Hubert for valuable discussions and comments on the manuscript.  ... 
arXiv:2110.12840v1 fatcat:5ott7uqvavhodldt6nimv2ussu

The Phenomenon of Policy Churn [article]

Tom Schaul, André Barreto, John Quan, Georg Ostrovski
2022 arXiv   pre-print
Will Dabney, Joseph Modayil and Matteo Hessel helped improve the paper with detailed feedback, and we thank David Silver, Diana Borsa, Miruna Pîslar, Claudia Clopath, Vlad Mnih, Iurii Kemaev, Junhyuk Oh  ... 
arXiv:2206.00730v2 fatcat:7ot7gzwr2vcvbb3jef3ksdochq

Proper Value Equivalence [article]

Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh
2021 arXiv   pre-print
Muesli: Combining improvements in policy optimization. arXiv preprint arXiv:2104.06159, 2021. [19] Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola  ... 
arXiv:2106.10316v2 fatcat:gyej7fdyyngvldsd5tv5bvxs5m

Muesli: Combining Improvements in Policy Optimization [article]

Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
2022 arXiv   pre-print
Acknowledgements We would like to thank Manuel Kroiss and Iurii Kemaev for developing the research platform we use to run and distribute experiments at scale.  ... 
arXiv:2104.06159v2 fatcat:4jafvxdd55f4tdj2vgt647gsxe