13,637 Hits in 7.4 sec

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior [article]

Siddharth Reddy, Anca D. Dragan, Sergey Levine
2019 arXiv   pre-print
These methods infer a goal or reward function that best explains the actions of the observed agent, typically a human demonstrator.  ...  Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning.  ...  We would like to thank Oleg Klimov for open-sourcing his implementation of the Lunar Lander game, which was originally developed by Atari in 1979, and inspired by the lunar modules built in the 1960s and  ... 
arXiv:1805.08010v4 fatcat:fv57qnxbrbbxhps45tsmr5eykq

Zero-Shot Assistance in Novel Decision Problems [article]

Sebastiaan De Peuter, Samuel Kaski
2022 arXiv   pre-print
We consider the problem of creating assistants that can help agents - often humans - solve novel sequential decision problems, assuming the agent is not able to specify the reward function explicitly to  ...  Finally, we show experimentally that our approach adapts to these agent biases, and results in higher cumulative reward for the agent than automation-based alternatives.  ...  This works was supported by the Technology Industries of Finland Centennial Foundation and the Jane and Aatos Erkko Foundation under project Interactive Artificial Intelligence for Driving R&D, the Academy  ... 
arXiv:2202.07364v1 fatcat:rf6zyvqgcraurbeijgqjgfsbvu

Mood as Representation of Momentum

Eran Eldar, Robb B. Rutledge, Raymond J. Dolan, Yael Niv
2016 Trends in Cognitive Sciences  
Specifically, we propose that mood represents the overall momentum of recent outcomes, and its biasing influence on the perception of outcomes 'corrects' learning to account for environmental dependencies  ...  First, mood depends on how recent reward outcomes differ from expectations. Second, mood biases the way we perceive outcomes (e.g., rewards), and this bias affects learning about those outcomes.  ...  Acknowledgments We thank Peter Dayan for helpful discussions and comments on a previous version of this manuscript.  ... 
doi:10.1016/j.tics.2015.07.010 pmid:26545853 pmcid:PMC4703769 fatcat:cewjmxguyvelpny3z3afqgh65a

Short-term reward experience biases inference despite dissociable neural correlates

Adrian G. Fischer, Sacha Bourgeois-Gironde, Markus Ullsperger
2017 Nature Communications  
Here we demonstrate that long-term, inference-based beliefs are biased by short-term reward experiences and that dissociable brain regions facilitate both types of learning.  ...  This suggests that counteracting the processing of optimally to-be-ignored short-term rewards and cortical suppression of associated reward-signals, determines long-term learning success and failure.  ...  For data acquisition of the replication study, we thank Cindy Lübeck, Christina Becker, Laura Waite and Yan Arnold for their support.  ... 
doi:10.1038/s41467-017-01703-0 pmid:29167430 pmcid:PMC5700163 fatcat:gl4wqy6imbcfffvz2rcftnargq

A nonparametric Bayesian approach to learning multimodal interaction management

Zhuoran Wang, Oliver Lemon
2012 2012 IEEE Spoken Language Technology Workshop (SLT)  
The performance of the proposed unsupervised approach is evaluated based on both artificially synthesised data and a manually transcribed and annotated human-human interaction corpus.  ...  We therefore propose a nonparametric Bayesian method to automatically infer the (distributional) representations of POMDP states for multimodal interactive systems, without using any domain knowledge.  ...  This suggests that the states inferred by the iPOMDPs can capture more information than the rather general state annotations.  ... 
doi:10.1109/slt.2012.6424162 dblp:conf/slt/WangL12 fatcat:5vbnbvbnpjfwpgftzkik3ubooi

Holistic Reinforcement Learning: The Role of Structure and Attention

Angela Radulescu, Yael Niv, Ian Ballard
2019 Trends in Cognitive Sciences  
In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment.  ...  Compact representations of the environment allow humans to behave efficiently in a complex world.  ...  Often, reward probability does not uniformly depend on all features. For instance, one feature may be more predictive of reward than others [6, 7] .  ... 
doi:10.1016/j.tics.2019.01.010 pmid:30824227 pmcid:PMC6472955 fatcat:n3dvwxnh5vggzoaqkemfkgzxpe

Learning to Arbitrate Human and Robot Control using Disagreement between Sub-Policies [article]

Yoojin Oh, Marc Toussaint, Jim Mainprice
2021 arXiv   pre-print
Our reward function reasons on this modality and prioritizes to match its learned policy to either the user or the robot accordingly.  ...  In the context of teleoperation, arbitration refers to deciding how to blend between human and autonomous robot commands.  ...  The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Yoojin Oh.  ... 
arXiv:2108.10634v1 fatcat:dmjh4clgnrbvllzdsdbdkuk44a

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks [article]

Julia Kreutzer, Stefan Riezler, Carolin Lawrence
2021 arXiv   pre-print
However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.  ...  Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged?  ...  Logs of user interactions are gold mines for off-policy learning, and they should be put to use, rather than being forgotten after a one-off evaluation purpose.  ... 
arXiv:2011.02511v3 fatcat:n5quenu4qfbbtjuyoa7eutn6wa

Experimental assessment of capacities for cumulative culture: Review and evaluation of methods

Christine A. Caldwell, Mark Atkinson, Kirsten H. Blakey, Juliet Dunstone, Donna Kean, Gemma Mackintosh, Elizabeth Renner, Charlotte E. H. Wilks
2019 Wiley Interdisciplinary Reviews: Cognitive Science  
By inferring the outcome of repeated transmission from the input-output response patterns of individual subjects, sample size requirements can be massively reduced.  ...  This limited evidence is noteworthy given widespread interest in the apparent distinctiveness of human cumulative culture, and the potentially significant theoretical implications of identifying related  ...  For certain types of task, it is possible to infer the outcome of repeated transmission events using data from individuals rather than chains.  ... 
doi:10.1002/wcs.1516 pmid:31441239 pmcid:PMC6916575 fatcat:t7h5ecgsfrecdlpislmak7vjmu

Apprenticeship learning for helicopter control

Adam Coates, Pieter Abbeel, Andrew Y. Ng
2009 Communications of the ACM  
a Autorotation is an emergency maneuver that allows a trained pilot to descend and land the helicopter without engine power.  ...  . 21 The fixed z we use is the one that maximizes the likelihood of the observations for the current setting of parameters t, d, Σ ( · ) . f In practice, rather than alternating between complete optimizations  ...  The standard formulation (which we use) expresses the dynamics and reward function as a function of the error state e t = s ts* t rather than the actual state s t .  ... 
doi:10.1145/1538788.1538812 fatcat:l7eea37tb5hbdpb6brtvmsjfca

Semantic Compression of Episodic Memories [article]

David G. Nagy, Balázs Török, Gergő Orbán
2018 arXiv   pre-print
in the experimental literature on human memory.  ...  We formalise the compression of episodes in the normative framework of information theory and argue that semantic memory provides the distortion function for compression of experiences.  ...  Acknowledgements The authors thank the anonymous reviewers for useful comments and Ferenc Huszár for discussions.  ... 
arXiv:1806.07990v1 fatcat:3xuv6pk3dbh57gfsoqeqpvv56q

Deep Value of Information Estimators for Collaborative Human-Machine Information Gathering [article]

Kin Gwn Lore, Nicholas Sweet, Kundan Kumar, Nisar Ahmed, Soumik Sarkar
2015 arXiv   pre-print
The practical feasibility of our method is also demonstrated on a mobile robotic search problem with language-based semantic human sensor inputs.  ...  Effective human-machine collaboration can significantly improve many learning and planning strategies for information gathering via fusion of 'hard' and 'soft' data originating from machine and human sensors  ...  that approximate low-dimensional reachable belief spaces via online sampling rather than through offline-learned feature compression [15] .  ... 
arXiv:1512.07592v1 fatcat:7xe4qhwxanbvnf2aplyeftrbxe

A description-experience gap in statistical intuitions: Of smart babies, risk-savvy chimps, intuitive statisticians, and stupid grown-ups

Christin Schulze, Ralph Hertwig
2021 Cognition  
Whereas babies seem to be intuitive statisticians, surprisingly capable of statistical learning and inference, adults' statistical inferences have been found to be inconsistent with the rules of probability  ...  To capture the full scope of human statistical intuition, we conclude, research on probabilistic reasoning across the lifespan, across species, and across research traditions must bear in mind that experience  ...  The sunk cost and Concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125 (5) , 591-600. https://doi. org/10.1037/0033-2909.125.5.591. References  ... 
doi:10.1016/j.cognition.2020.104580 pmid:33667974 fatcat:inng6rasmfacpikhkxin4trfw4

Bayes Optimality of Human Perception, Action and Learning: Behavioural and Neural Evidence [chapter]

Ulrik R. Beierholm
2014 Lecture Notes in Computer Science  
Reward learning -Model based A special case of learning that has gathered a lot of attention, partly due to its link with the literature on Pavlovian and Operant conditioning, is that of reward learning  ...  The obvious way to learn all of these variables is to utilize Bayesian inference for the parameters themselves.  ... 
doi:10.1007/978-3-319-12084-3_10 fatcat:6nj3puc6wffarfjgxyt46syo6q

Advances in the computational understanding of mental illness

Quentin J. M. Huys, Michael Browning, Martin Paulus, Michael J. Frank
2020 Neuropsychopharmacology  
The review divides work up into three theoretical approaches that have deep mathematical connections: dynamical systems, Bayesian inference and reinforcement learning.  ...  We argue that the brain is a computational organ. As such, an understanding of the illnesses arising from it will require a computational framework.  ...  The avoidance in this case is driven by the interpretation that a different course of action than the one taken can enhance the chances of another reward.  ... 
doi:10.1038/s41386-020-0746-4 pmid:32620005 pmcid:PMC7688938 fatcat:mrlghkzmnbc5tjw234fpc5wy5y
« Previous Showing results 1 — 15 out of 13,637 results