299 Hits in 1.3 sec

Deep Sets for Generalization in RL [article]

Tristan Karch, Cédric Colas, Laetitia Teodorescu, Clément Moulin-Frier, Pierre-Yves Oudeyer
2020 arXiv   pre-print
Full details about the setup, architectures and training schedules are reported from Colas et al. (2020) in the Appendices.  ...  Further implementation details, training schedules and pseudo-code can be found in the companion paper (Colas et al., 2020) .  ... 
arXiv:2003.09443v1 fatcat:ybbvjsajxzeulkfpdquko3pk4a

A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms [article]

Cédric Colas, Olivier Sigaud, Pierre-Yves Oudeyer
2019 arXiv   pre-print
Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further
more » ... igate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.
arXiv:1904.06979v1 fatcat:qhcqa5t4rjekphb3zadwhc6rva

Crossover Between Quantum and Classical Waves and High Frequency Localization Landscapes [article]

David Colas, Cédric Bellis, Bruno Lombard, Régis Cottereau
2022 arXiv   pre-print
Anderson localization is a universal interference phenomenon occurring when a wave evolves through a random medium and it has been observed in a great variety of physical systems, either quantum or classical. The recently developed localization landscape theory offers a computationally affordable way to obtain useful information on the localized modes, such as their location or size. Here we examine this theory in the context of classical waves exhibiting high frequency localization and for
more » ... h the original localization landscape approach is no longer informative. Using a Webster's transformation, we convert a classical wave equation into a Schrödinger equation with the same localization properties. We then compute an adapted localization landscape to retrieve information on the original classical system. This work offers an affordable way to access key information on high-frequency mode localization.
arXiv:2204.11632v1 fatcat:yssziilwazfnni3a2eapjqsxie

Help Me Explore: Minimal Social Interventions for Graph-Based Autotelic Agents [article]

Ahmed Akakzia, Olivier Serris, Olivier Sigaud, Cédric Colas
2022 arXiv   pre-print
., 2017; Colas et al., 2020b) .  ...  The IMAGINE approach attempts to leverage this feature in IMRL by using language to imagine new goals (Colas et al., 2020a) .  ... 
arXiv:2202.05129v1 fatcat:6f74scyoinfexjdzhhypy6exne

Compact Convolutional Neural Networks for Multi-Class, Personalised, Closed-Loop EEG-BCI [article]

Pablo Ortega and Cedric Colas and Aldo Faisal
2018 arXiv   pre-print
For many people suffering from motor disabilities, assistive devices controlled with only brain activity are the only way to interact with their environment. Natural tasks often require different kinds of interactions, involving different controllers the user should be able to select in a self-paced way. We developed a Brain-Computer Interface (BCI) allowing users to switch between four control modes in a self-paced way in real-time. Since the system is devised to be used in domestic
more » ... s in a user-friendly way, we selected non-invasive electroencephalographic (EEG) signals and convolutional neural networks (CNNs), known for their ability to find the optimal features in classification tasks. We tested our system using the Cybathlon BCI computer game, which embodies all the challenges inherent to real-time control. Our preliminary results show that an efficient architecture (SmallNet), with only one convolutional layer, can classify 4 mental activities chosen by the user. The BCI system is run and validated online. It is kept up-to-date through the use of newly collected signals along playing, reaching an online accuracy of 47.6% where most approaches only report results obtained offline. We found that models trained with data collected online better predicted the behaviour of the system in real-time. This suggests that similar (CNN based) offline classifying methods found in the literature might experience a drop in performance when applied online. Compared to our previous decoder of physiological signals relying on blinks, we increased by a factor 2 the amount of states among which the user can transit, bringing the opportunity for finer control of specific subtasks composing natural grasping in a self-paced way. Our results are comparable to those shown at the Cybathlon's BCI Race but further improvements on accuracy are required.
arXiv:1807.11752v1 fatcat:tr6bodnmnzc4bkqvqhhs2fmpte

Towards Teachable Autotelic Agents [article]

Olivier Sigaud and Ahmed Akakzia and Hugo Caselles-Dupré and Cédric Colas and Pierre-Yves Oudeyer and Mohamed Chetouani
2022 arXiv   pre-print
Colas et al. (2020a) .  ...  More details about this work are presented in Colas et al. (2020a) .  ... 
arXiv:2105.11977v2 fatcat:w37mjxeaafefnko3a64on7l3uu

Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments [article]

Rémy Portelas, Cédric Colas, Katja Hofmann, Pierre-Yves Oudeyer
2019 arXiv   pre-print
We consider the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we study how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not initially know the capacities of its student, a key challenge for the teacher is to discover which
more » ... vironments are easy, difficult or unlearnable, and in what order to propose them to maximize the efficiency of learning over the learnable ones. To achieve this, this problem is transformed into a surrogate continuous bandit problem where the teacher samples environments in order to maximize absolute learning progress of its student. We present a new algorithm modeling absolute learning progress with Gaussian mixture models (ALP-GMM). We also adapt existing algorithms and provide a complete study in the context of DRL. Using parameterized variants of the BipedalWalker environment, we study their efficiency to personalize a learning curriculum for different learners (embodiments), their robustness to the ratio of learnable/unlearnable environments, and their scalability to non-linear and high-dimensional parameter spaces. Videos and code are available at
arXiv:1910.07224v1 fatcat:2lfkj5aj7jewjiz6lugjzhhtyy

Vygotskian Autotelic Artificial Intelligence: Language and Culture Internalization for Human-Like AI [article]

Cédric Colas, Tristan Karch, Clément Moulin-Frier, Pierre-Yves Oudeyer
2022 arXiv   pre-print
Building autonomous artificial agents able to grow open-ended repertoires of skills is one of the fundamental goals of AI. To that end, a promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. However, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalization or skill composition. This perspective calls for the immersion of
more » ... lic agents into rich socio-cultural worlds. We focus on language especially, and how its structure and content may support the development of new cognitive functions in artificial agents, just like it does in humans. Indeed, most of our skills could not be learned in isolation. Formal education teaches us to reason systematically, books teach us history, and YouTube might teach us how to cook. Crucially, our values, traditions, norms and most of our goals are cultural in essence. This knowledge, and some argue, some of our cognitive functions such as abstraction, compositional imagination or relational thinking, are formed through linguistic and cultural interactions. Inspired by the work of Vygotsky, we suggest the design of Vygotskian autotelic agents able to interact with others and, more importantly, able to internalize these interactions to transform them into cognitive tools supporting the development of new cognitive functions. This perspective paper proposes a new AI paradigm in the quest for artificial lifelong skill discovery. It justifies the approach by uncovering examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, it highlights future opportunities and challenges for Vygotskian Autotelic AI research.
arXiv:2206.01134v1 fatcat:nr7m5nsbijhmphfe6uuxxrosca

Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL [article]

Cédric Colas, Ahmed Akakzia, Pierre-Yves Oudeyer, Mohamed Chetouani, Olivier Sigaud
2020 arXiv   pre-print
Acknowledgments Cédric Colas is partly funded by the French Ministère des Armées -Direction Générale de l'Armement.  ...  ., 2019; Colas et al., 2020b) .  ...  Type 4, especially, involves recombinations of action verbs and attributes similar to Hill et al. (2019) ; Colas et al. (2020b) .  ... 
arXiv:2006.07043v1 fatcat:yejngap235cchcrtrxycriiyni

Scaling MAP-Elites to Deep Neuroevolution [article]

Cédric Colas, Joost Huizinga, Vashisht Madhavan, Jeff Clune
2020 arXiv   pre-print
Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of MAP-Elites and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we
more » ... e to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally,we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.
arXiv:2003.01825v1 fatcat:owu3icwqxfhafhag3wclblaghy

Automatic Curriculum Learning For Deep RL: A Short Survey [article]

Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, Pierre-Yves Oudeyer
2020 arXiv   pre-print
; Colas et al., 2020].  ...  ., 2016; Colas et al., 2020] .  ... 
arXiv:2003.04664v2 fatcat:lhire3htmnenfetx2ry4furgyy

EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models [article]

Cédric Colas, Boris Hejblum, Sébastien Rouillon, Rodolphe Thiébaut, Pierre-Yves Oudeyer, Clément Moulin-Frier, Mélanie Prague
2020 arXiv   pre-print
Cédric Colas is partly funded by the French Ministère des Armées -Direction Générale de l'Armement.  ...  For this reason, we integrate to our framework a library for statistical comparisons designed for RL experiments (Colas et al., 2019) .  ... 
arXiv:2010.04452v1 fatcat:e26h3ifw4rhafifjepabjh2rlm

Convolutional neural network, personalised, closed-loop Brain-Computer Interfaces for multi-way control mode switching in real-time [article]

Pablo Ortega, Cedric Colas, Aldo Faisal
2018 bioRxiv   pre-print
Exoskeletons and robotic devices are for many motor disabled people the only way to interact with their environment. Our lab previously developed a gaze guided assistive robotic system for grasping. It is well known that the same natural task can require different interactions described by different dynamical systems that would require different robotic controllers and their selection by the user in a self paced way. Therefore, we investigated different ways to achieve transitions between
more » ... le states, finding that eye blinks were the most reliable to transition from 'off' to 'control' modes (binary classification) compared to voice and electromyography. In this paper be expanded on this work by investigating brain signals as sources for control mode switching. We developed a Brain Computer Interface (BCI) that allows users to switch between four control modes in self paced way in real time. Since the system is devised to be used in domestic environments in a user friendly way, we selected non-invasive electroencephalographic (EEG) signals and convolutional neural networks (ConvNets), known by their capability to find the optimal features for a classification task, which we hypothesised would add flexibility to the system in terms of which mental activities the user could perform to control it. We tested our system using the Cybathlon BrainRunners computer game, which represents all the challenges inherent to real time control. Our preliminary results show that an efficient architecture (SmallNet) composed by a convolutional layer, a fully connected layer and a sigmoid classification layer, is able to classify 4 mental activities that the user chose to perform. For his preferred mental activities, we run and validated the system online and retrained the system using online collected EEG data. We achieved 47,6% accuracy in online operation in the 4-way classification task. In particular we found that models trained with online collected data predicted better the behaviour of the system in real time suggesting, as a side note, that similar (ConvNets based) offline classifying methods present in literature might find a decay in performance when applied online. To the best of our knowledge this is the first time such an architecture is tested in an online operation task. While compared to our previous method relying on blinks with this one we reduced in less than half (1.6 times) the accuracy but increased by 2 the amount of states among which we can transit, bringing the opportunity for finer control of specific subtasks composing natural grasping in a self paced way.
doi:10.1101/256701 fatcat:fj2m3gvedvcqlhjquicq4hpjiy

Learning to Identify Users and Predict Their Destination in a Robotic Guidance Application [chapter]

Xavier Perrin, Francis Colas, Cédric Pradalier, Roland Siegwart
2010 Springer Tracts in Advanced Robotics  
User guidance systems are relevant to various applications of the service robotics field, among which: smart GPS navigator, robotic guides for museum or shopping malls or robotic wheel chairs for disabled persons. Such a system aims at helping its user to reach its destination in a fairly complex environment. If we assume the system is used in a fixed environment by multiple users for multiple navigation task over the course of days or weeks, then it is possible to take advantage of the user
more » ... tine: from the initial navigational choice, users can be identified and their goal can be predicted. As a result of these prediction, the guidance system can bring its user to its destination while requiring less interaction. This property is particularly relevant for assisting disabled person for whom interaction is a long and complex task. In this paper, we implement a user guidance system using a dynamic Bayesian model and a topological representation of the environment. This model is evaluated with respect to the quality of its action prediction in a scenario involving 4 human users, and it is shown that in addition to the user identity, the goals and actions of the user are accurately predicted.
doi:10.1007/978-3-642-13408-1_34 fatcat:2kydicw63vezffwd53q474s2iy

How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments [article]

Cédric Colas and Olivier Sigaud and Pierre-Yves Oudeyer
2018 arXiv   pre-print
Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how the number of random seeds relates to the probabilities of statistical errors. For both the t-test and the bootstrap confidence interval test, we recall theoretical guidelines to determine the number of random seeds one should use to provide a statistically
more » ... ignificant comparison of the performance of two algorithms. Finally, we discuss the influence of deviations from the assumptions usually made by statistical tests. We show that they can lead to inaccurate evaluations of statistical errors and provide guidelines to counter these negative effects. We make our code available to perform the tests.
arXiv:1806.08295v2 fatcat:k7gcp3uow5emlmprq2giqmb26y
« Previous Showing results 1 — 15 out of 299 results