130,244 Hits in 3.6 sec

Quantifying Differences in Reward Functions [article]

Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike
2021 arXiv   pre-print
Moreover, this method can only tell us about behavior in the evaluation environment, but the reward may incentivize very different behavior in even a slightly different deployment environment.  ...  To address these problems, we introduce the Equivalent-Policy Invariant Comparison (EPIC) distance to quantify the difference between two reward functions directly, without a policy optimization step.  ...  This guarantees that if the expected difference between rewards is small on a given region, then all points in this region have bounded reward difference.  ... 
arXiv:2006.13900v3 fatcat:xulomulbknfgrp2lxybiyzjpj4

Page 799 of Journal of Comparative Psychology Vol. 51, Issue 6 [page]

1958 Journal of Comparative Psychology  
difference in preference (or reward value).  ...  In the case of punishment the aversive stimulus quantified is itself generally regarded as the ‘drive’; in the case of reward, however, the stimulus quantified is not the drive.  ... 

Risk Averse Bayesian Reward Learning for Autonomous Navigation from Human Demonstration [article]

Christian Ellis, Maggie Wigness, John G. Rogers III, Craig Lennon, Lance Fiondella
2021 arXiv   pre-print
This paper proposes a Bayesian technique which quantifies uncertainty over the weights of a linear reward function given a dataset of minimal human demonstrations to operate safely in dynamic environments  ...  Traditional imitation learning provides a set of methods and algorithms to learn a reward function or policy from expert demonstrations.  ...  [7] and our methodology is in the reward function selection technique. The reward function obtained in Ref.  ... 
arXiv:2108.00276v1 fatcat:bs6r427njzgrfoidc7djtfh2aa

Asymmetric and adaptive reward coding via normalized reinforcement learning [article]

Kenway Louie
2021 bioRxiv   pre-print
While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are  ...  rewards.  ...  Distributional RL To examine whether information about experienced reward distributions were encoded in learned NRL responses, we quantified NRL agent behavior in different reward environments.  ... 
doi:10.1101/2021.11.24.469880 fatcat:wf4nbalpmzeuriwvl2qjax4jwy

Separating Value from Choice: Delay Discounting Activity in the Lateral Intraparietal Area

K. Louie, P. W. Glimcher
2010 Journal of Neuroscience  
We recorded the activity of neurons in the lateral intraparietal area while monkeys performed an intertemporal choice task for rewards differing in delay to reinforcement.  ...  These findings show that in addition to information about gains, parietal cortex also incorporates information about delay into a precise physiological correlate of economic value functions, independent  ...  The intertemporal choice task was conducted under four different conditions of delayed reward magnitude (0.143, 0.163, 0.196, and 0.260 ml) , randomized across sessions, to quantify the discount function  ... 
doi:10.1523/jneurosci.5742-09.2010 pmid:20410103 pmcid:PMC2898568 fatcat:6iwbsegpgvcytowqpba22ode2i

Divergent investment strategies of Acacia myrmecophytes and the coexistence of mutualists and exploiters

M. Heil, M. Gonzalez-Teuber, L. W. Clement, S. Kautz, M. Verhaagh, J. C. S. Bueno
2009 Proceedings of the National Academy of Sciences of the United States of America  
Host plant species represented 2 different strategies.  ...  Here, we link physiological, ecological, and phylogenetic approaches to study the evolution and coexistence of strategies in the Acacia-Pseudomyrmex system.  ...  Fig. 5 . 5 Colony size as a function of plant reward level.  ... 
doi:10.1073/pnas.0904304106 pmid:19717429 pmcid:PMC2775331 fatcat:swts2txobfhtxcemm4to76norq

Translational Rodent Paradigms to Investigate Neuromechanisms Underlying Behaviors Relevant to Amotivation and Altered Reward Processing in Schizophrenia

Jared W. Young, Athina Markou
2015 Schizophrenia Bulletin  
Amotivation and reward-processing deficits have long been described in patients with schizophrenia and considered large contributors to patients' inability to integrate well in society.  ...  We describe tasks that measure the motivation of rodents to expend physical and cognitive effort to gain rewards, as well as probabilistic learning tasks that assess both reward learning and feedback-based  ...  In the past 3 years Dr Young's work has been funded by NIDA and NIMH,  ... 
doi:10.1093/schbul/sbv093 pmid:26194891 pmcid:PMC4535652 fatcat:2h7vjxudfjcbdepteko52gcotm

A unified strategy for implementing curiosity and empowerment driven reinforcement learning [article]

Ildefons Magrans de Abril, Ryota Kanai
2018 arXiv   pre-print
Curiosity reward informs the agent about the relevance of a recent agent action, whereas empowerment is implemented as the opposite information flow from the agent to the environment that quantifies the  ...  We show how a shared internal model by curiosity and empowerment facilitates a more efficient training of the empowerment function.  ...  In this case, the reward function at state s t is defined by R emp (s t ) = max ω I(A t ; S t+1 |s t ).  ... 
arXiv:1806.06505v1 fatcat:ky5xgsyyhjgjfnck7bvmexdd5y

Is getting older all that rewarding?

D. F. Wong
2008 Proceedings of the National Academy of Sciences of the United States of America  
Although the FDopa data are not age-specific, the fMRI data may reflect the functional consequences of DA deficit. These studies provide evidence for why reward may be different in older age groups.  ...  Another issue is possible sex differences in reward mechanisms in older individuals. In Dreher et al.  ... 
doi:10.1073/pnas.0807850105 pmid:18812511 pmcid:PMC2567438 fatcat:o6xufewc4zfgdnzoosk56mul3m

Page 303 of The Journal of Neuroscience Vol. 23, Issue 1 [page]

2003 The Journal of Neuroscience  
The size of reward was varied across blocks of the task to detect different patterns of response in rela- tion to reward value.  ...  Money has the practical advantage of being an objectively quantifiable reinforcer.  ... 

Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data [article]

Letian Wang, Liting Sun, Masayoshi Tomizuka, Wei Zhan
2022 arXiv   pre-print
We also find that such driving preferences vary significantly in different cultures.  ...  It allows the AVs to infer the characteristics of other road users online and generate behaviors optimizing not only their own rewards, but also their courtesy to others, and their confidence regarding  ...  Safety is quantified as the relative distance of two cars, and two cars' distance to the intersection point of reference lines. where R O (u O |x 0 , u E ) quantifies the other car's rewards as in (6)  ... 
arXiv:2010.14712v7 fatcat:njrugyvdtfavldklonc7bfdwye

The ventral striatum dissociates information expectation, reward anticipation, and reward receipt

Flavia Filimon, Jonathan D. Nelson, Terrence J. Sejnowski, Martin I. Sereno, Garrison W. Cottrell
2020 Proceedings of the National Academy of Sciences of the United States of America  
Moreover, we show a temporal dissociation in the activation of different reward-related regions, including the nucleus accumbens, medial prefrontal cortex, and orbitofrontal cortex, during information  ...  In particular, this formulation quantifies the value of information before the answer to that query is known, in situations where payoffs are unknown and the goal is purely epistemic: That is, to increase  ...  and opioid receptors different from those found in other parts of the reward system.  ... 
doi:10.1073/pnas.1911778117 pmid:32527855 fatcat:llb6wl5bc5b3hntbw3k2osm3a4

Modular inverse reinforcement learning for visuomotor behavior

Constantin A. Rothkopf, Dana H. Ballard
2013 Biological cybernetics  
To quantify the agent's goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent's goals as rewards implicit in the observed behavior  ...  It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals.  ...  However the number of data points required can vary due to the different sensitivities to distal rewards in different parts of the data space.  ... 
doi:10.1007/s00422-013-0562-6 pmid:23832417 pmcid:PMC3773182 fatcat:jrmnzdr5vbagdkemmrnbubx5ea

Risk-Sensitive Reinforcement Learning

Yun Shen, Michael J. Tobia, Tobias Sommer, Klaus Obermayer
2014 Neural Computation  
By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of  ...  As a proof of principle for the applicability of the new framework we apply it to quantify human behavior in a sequential investment task.  ...  In our setting, valuation functions are not necessarily centralized, that is, ρ(0, μ) is not necessarily 0, since ρ(0, μ) in fact sets a reference point that can differ for different agents.  ... 
doi:10.1162/neco_a_00600 pmid:24708369 fatcat:htcg2vrytbh6fgyysdygm5jyde

Differential Effects of Psychotic Illness on Directed and Random Exploration

James A. Waltz, Robert C. Wilson, Matthew A. Albrecht, Michael J. Frank, James M. Gold
2020 Computational Psychiatry  
Moreover, in PSZ, deficits in directed exploration were related to measures of intellectual function, whereas random exploration was related to positive symptoms.  ...  We found that PSZ patients show reduced directed exploration relative to HVs, but no difference in random exploration.  ...  quantify individual differences in directed and random exploration.  ... 
doi:10.1162/cpsy_a_00027 pmid:33768158 pmcid:PMC7990386 fatcat:kuadolbysrhsrdvzyaiwwjzxtq
« Previous Showing results 1 — 15 out of 130,244 results