A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior
[article]
2019
arXiv
pre-print
These methods infer a goal or reward function that best explains the actions of the observed agent, typically a human demonstrator. ...
Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning. ...
We would like to thank Oleg Klimov for open-sourcing his implementation of the Lunar Lander game, which was originally developed by Atari in 1979, and inspired by the lunar modules built in the 1960s and ...
arXiv:1805.08010v4
fatcat:fv57qnxbrbbxhps45tsmr5eykq
Zero-Shot Assistance in Novel Decision Problems
[article]
2022
arXiv
pre-print
We consider the problem of creating assistants that can help agents - often humans - solve novel sequential decision problems, assuming the agent is not able to specify the reward function explicitly to ...
Finally, we show experimentally that our approach adapts to these agent biases, and results in higher cumulative reward for the agent than automation-based alternatives. ...
This works was supported by the Technology Industries of Finland Centennial Foundation and the Jane and Aatos Erkko Foundation under project Interactive Artificial Intelligence for Driving R&D, the Academy ...
arXiv:2202.07364v1
fatcat:rf6zyvqgcraurbeijgqjgfsbvu
Mood as Representation of Momentum
2016
Trends in Cognitive Sciences
Specifically, we propose that mood represents the overall momentum of recent outcomes, and its biasing influence on the perception of outcomes 'corrects' learning to account for environmental dependencies ...
First, mood depends on how recent reward outcomes differ from expectations. Second, mood biases the way we perceive outcomes (e.g., rewards), and this bias affects learning about those outcomes. ...
Acknowledgments We thank Peter Dayan for helpful discussions and comments on a previous version of this manuscript. ...
doi:10.1016/j.tics.2015.07.010
pmid:26545853
pmcid:PMC4703769
fatcat:cewjmxguyvelpny3z3afqgh65a
Short-term reward experience biases inference despite dissociable neural correlates
2017
Nature Communications
Here we demonstrate that long-term, inference-based beliefs are biased by short-term reward experiences and that dissociable brain regions facilitate both types of learning. ...
This suggests that counteracting the processing of optimally to-be-ignored short-term rewards and cortical suppression of associated reward-signals, determines long-term learning success and failure. ...
For data acquisition of the replication study, we thank Cindy Lübeck, Christina Becker, Laura Waite and Yan Arnold for their support. ...
doi:10.1038/s41467-017-01703-0
pmid:29167430
pmcid:PMC5700163
fatcat:gl4wqy6imbcfffvz2rcftnargq
A nonparametric Bayesian approach to learning multimodal interaction management
2012
2012 IEEE Spoken Language Technology Workshop (SLT)
The performance of the proposed unsupervised approach is evaluated based on both artificially synthesised data and a manually transcribed and annotated human-human interaction corpus. ...
We therefore propose a nonparametric Bayesian method to automatically infer the (distributional) representations of POMDP states for multimodal interactive systems, without using any domain knowledge. ...
This suggests that the states inferred by the iPOMDPs can capture more information than the rather general state annotations. ...
doi:10.1109/slt.2012.6424162
dblp:conf/slt/WangL12
fatcat:5vbnbvbnpjfwpgftzkik3ubooi
Holistic Reinforcement Learning: The Role of Structure and Attention
2019
Trends in Cognitive Sciences
In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. ...
Compact representations of the environment allow humans to behave efficiently in a complex world. ...
Often, reward probability does not uniformly depend on all features. For instance, one feature may be more predictive of reward than others [6, 7] . ...
doi:10.1016/j.tics.2019.01.010
pmid:30824227
pmcid:PMC6472955
fatcat:n3dvwxnh5vggzoaqkemfkgzxpe
Learning to Arbitrate Human and Robot Control using Disagreement between Sub-Policies
[article]
2021
arXiv
pre-print
Our reward function reasons on this modality and prioritizes to match its learned policy to either the user or the robot accordingly. ...
In the context of teleoperation, arbitration refers to deciding how to blend between human and autonomous robot commands. ...
The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Yoojin Oh. ...
arXiv:2108.10634v1
fatcat:dmjh4clgnrbvllzdsdbdkuk44a
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
[article]
2021
arXiv
pre-print
However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions. ...
Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? ...
Logs of user interactions are gold mines for off-policy learning, and they should be put to use, rather than being forgotten after a one-off evaluation purpose. ...
arXiv:2011.02511v3
fatcat:n5quenu4qfbbtjuyoa7eutn6wa
Experimental assessment of capacities for cumulative culture: Review and evaluation of methods
2019
Wiley Interdisciplinary Reviews: Cognitive Science
By inferring the outcome of repeated transmission from the input-output response patterns of individual subjects, sample size requirements can be massively reduced. ...
This limited evidence is noteworthy given widespread interest in the apparent distinctiveness of human cumulative culture, and the potentially significant theoretical implications of identifying related ...
For certain types of task, it is possible to infer the outcome of repeated transmission events using data from individuals rather than chains. ...
doi:10.1002/wcs.1516
pmid:31441239
pmcid:PMC6916575
fatcat:t7h5ecgsfrecdlpislmak7vjmu
Apprenticeship learning for helicopter control
2009
Communications of the ACM
a Autorotation is an emergency maneuver that allows a trained pilot to descend and land the helicopter without engine power. ...
. 21 The fixed z we use is the one that maximizes the likelihood of the observations for the current setting of parameters t, d, Σ ( · ) . f In practice, rather than alternating between complete optimizations ...
The standard formulation (which we use) expresses the dynamics and reward function as a function of the error state e t = s ts* t rather than the actual state s t . ...
doi:10.1145/1538788.1538812
fatcat:l7eea37tb5hbdpb6brtvmsjfca
Semantic Compression of Episodic Memories
[article]
2018
arXiv
pre-print
in the experimental literature on human memory. ...
We formalise the compression of episodes in the normative framework of information theory and argue that semantic memory provides the distortion function for compression of experiences. ...
Acknowledgements The authors thank the anonymous reviewers for useful comments and Ferenc Huszár for discussions. ...
arXiv:1806.07990v1
fatcat:3xuv6pk3dbh57gfsoqeqpvv56q
Deep Value of Information Estimators for Collaborative Human-Machine Information Gathering
[article]
2015
arXiv
pre-print
The practical feasibility of our method is also demonstrated on a mobile robotic search problem with language-based semantic human sensor inputs. ...
Effective human-machine collaboration can significantly improve many learning and planning strategies for information gathering via fusion of 'hard' and 'soft' data originating from machine and human sensors ...
that approximate low-dimensional reachable belief spaces via online sampling rather than through offline-learned feature compression [15] . ...
arXiv:1512.07592v1
fatcat:7xe4qhwxanbvnf2aplyeftrbxe
A description-experience gap in statistical intuitions: Of smart babies, risk-savvy chimps, intuitive statisticians, and stupid grown-ups
2021
Cognition
Whereas babies seem to be intuitive statisticians, surprisingly capable of statistical learning and inference, adults' statistical inferences have been found to be inconsistent with the rules of probability ...
To capture the full scope of human statistical intuition, we conclude, research on probabilistic reasoning across the lifespan, across species, and across research traditions must bear in mind that experience ...
The sunk cost and Concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125 (5) , 591-600. https://doi. org/10.1037/0033-2909.125.5.591.
References ...
doi:10.1016/j.cognition.2020.104580
pmid:33667974
fatcat:inng6rasmfacpikhkxin4trfw4
Bayes Optimality of Human Perception, Action and Learning: Behavioural and Neural Evidence
[chapter]
2014
Lecture Notes in Computer Science
Reward learning -Model based A special case of learning that has gathered a lot of attention, partly due to its link with the literature on Pavlovian and Operant conditioning, is that of reward learning ...
The obvious way to learn all of these variables is to utilize Bayesian inference for the parameters themselves. ...
doi:10.1007/978-3-319-12084-3_10
fatcat:6nj3puc6wffarfjgxyt46syo6q
Advances in the computational understanding of mental illness
2020
Neuropsychopharmacology
The review divides work up into three theoretical approaches that have deep mathematical connections: dynamical systems, Bayesian inference and reinforcement learning. ...
We argue that the brain is a computational organ. As such, an understanding of the illnesses arising from it will require a computational framework. ...
The avoidance in this case is driven by the interpretation that a different course of action than the one taken can enhance the chances of another reward. ...
doi:10.1038/s41386-020-0746-4
pmid:32620005
pmcid:PMC7688938
fatcat:mrlghkzmnbc5tjw234fpc5wy5y
« Previous
Showing results 1 — 15 out of 13,637 results