7,160 Hits in 3.7 sec

Shaping Proto-Value Functions via Rewards [article]

Chandrashekar Lakshmi Narayanan, Raj Kumar Maity, Shalabh Bhatnagar
2015 arXiv   pre-print
In this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs).  ...  In constructing the RPVFs we are making use of the immediate rewards which are available during the sampling phase but are not used in the PVF construction.  ...  Conclusion We combined the task-independent proto-value function (PVF) construction and the task-specific reward shaping to obtain Reward based Proto-Value Functions (RPVFs).  ... 
arXiv:1511.08589v1 fatcat:h7bhv3romraqpeyxlnkwb5ld2e

Reward Propagation Using Graph Convolutional Networks [article]

Martin Klissarov, Doina Precup
2020 arXiv   pre-print
The propagated messages can then be used as potential functions for reward shaping to accelerate learning.  ...  Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning.  ...  of Canada (NSERC) and the Fonds de recherche du Quebec -Nature et Technologies (FRQNT) for funding this research; Khimya Khetarpal for her invaluable and timely help; Zafareli Ahmed and Sitao Luan for useful  ... 
arXiv:2010.02474v2 fatcat:dnnkwnv7sfcjvd4tpuqamtdadm

Tracking Proto-Porcelain Production and Consumption in the Dongjiang Valley of Bronze Age Lingnan

Michèle H.S. Demandt
2019 Cambridge Archaeological Journal  
It will be argued that proto-porcelain was a suitable medium for the simultaneous expression of different social roles that might have included its use as serving ware in community rituals as well as its  ...  creation and consequent social usage of proto-porcelain.  ...  I would also like to thank Nina Demandt for drawing the maps used in this study. Michèle H.S. Demandt History Department Jinan University Guangzhou PRC Email:  ... 
doi:10.1017/s0959774319000246 fatcat:vpfkeirgifhu3kl6zstctfwzi4

Reciprocal-adaptation in a creature-based futuristic sociable dining table

Yuki Kado, Takanori Kamoda, Yuta Yoshiike, P. Ravindra S. De Silva, Michio Okada
2010 19th International Symposium in Robot and Human Interactive Communication  
The creature uses a knock (within a specific period) with a Temporal Difference (TD) learning model to learn and to adapt to a user's intentions, and the creature moves a different direction (right, left  ...  A positive TD error indicates that the tendency to select (a t ) should be strengthened for the future and the value function (critic) should be updated using: V (S t ) ← V (S t ) + α * δ t , (6) where  ...  Rewards are discrete (either negative or positive values), and the creature acquires a positive reward when the user does not make a knocking sound during movement.  ... 
doi:10.1109/roman.2010.5598727 dblp:conf/ro-man/KadoKYSO10 fatcat:mvq7tkcnlrb77dwzdg65qeatry

Attentive Monitoring of Multiple Video Streams Driven by a Bayesian Foraging Strategy

Paolo Napoletano, Giuseppe Boccignone, Francesco Tisato
2015 IEEE Transactions on Image Processing  
The likelihood P (¬R (k) (t) | C (k) (t)) of not gaining reward, by using the definition of the detection function, Eq.15, P (¬R (k) (t) | C (k) (t)) = exp(−λt) (19) Since, by definition, reward can be  ...  Then, next location is chosen so as to maximize the expected reward: r F (t + 1) = arg max r (k) new E R (k) r (k) new . (11) The expected reward is computed with reference to the value of proto-objects  ... 
doi:10.1109/tip.2015.2431438 pmid:25966475 fatcat:lwxlfeytorcqliys63mksob2oe

Modelling Task-Dependent Eye Guidance to Objects in Pictures

Antonio Clavelli, Dimosthenis Karatzas, Josep Lladós, Mario Ferraro, Giuseppe Boccignone
2014 Cognitive Computation  
The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects.  ...  Value and payoff Following the discussion in Sect. 2, we use the payoff (or reward) as an operational concept for describing the value that the foraging eye gains, under a given task, for landing, after  ...  [73] , though their study was limited to the use of primary rewards.  ... 
doi:10.1007/s12559-014-9262-3 fatcat:birifcsa3fghvfbp6dbujierci

Page 181 of Journal of Cognitive Neuroscience Vol. 22, Issue 1 [page]

2010 Journal of Cognitive Neuroscience  
This avoids one of the criticisms of earlier tests, namely, that monkeys could use hedonic value in making numerical judgments.  ...  This might have been due to the extensive training with the first two proto- cols, whereas the shape versus shape protocol was in- troduced at once without further training.  ... 

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem [article]

Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto
2018 arXiv   pre-print
To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstrap updates.  ...  Finally, we test whether the problem could be mitigated with a better state representation, and whether it can be learned in an unsupervised manner, without rewards or privileged information.  ...  Figure 1 : Random policy trajectories in a S-shaped layout with two reward zones (map 1), and its ground truth value function.  ... 
arXiv:1807.03064v1 fatcat:to63dgnhnbf47ml5zdmyxqnpxi

Representations for Stable Off-Policy Reinforcement Learning [article]

Dibya Ghosh, Marc G. Bellemare
2020 arXiv   pre-print
We analyze representation learning schemes that are based on the transition matrix of a policy, such as proto-value functions, along three axes: approximation error, stability, and ease of estimation.  ...  For a fixed reward function, we find that an orthogonal basis of the corresponding Krylov subspace is an even better choice.  ...  Representation Learning In reinforcement learning, a large class of methods have focused on constructing a representation Φ from the transition and reward functions, beginning perhaps with proto-value  ... 
arXiv:2007.05520v2 fatcat:ckqjzdg46zfq7mkpkajtdukyru

Martin-CSE21.pdf [article]

Daniel Martin
higher reward by taking advantage of structure of system.  ...  patch as compositions of Stencil operations, and pointwise application of functions with multiple rectangular grid data arguments (forall)  ... 
doi:10.6084/m9.figshare.14153717.v1 fatcat:kioz7cyzmrdntbriwp7akp6une

Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning [article]

Wenbin Li, Jeannette Bohg, Mario Fritz
2017 arXiv   pre-print
Figure 3 : 3 Figure 3: Our proposed model GDQN which extends the Q-function approximator to integrate goal information. Figure 4 : 4 Figure 4: Reward shaping used in target stacking.  ...  The author introduces an analogous formulation to the Q-learning by using shortest path in replacement of the value functions.  ... 
arXiv:1711.00267v2 fatcat:baotsgizlncpvn2gppejumf2bi

Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing [article]

Kaixin Wang, Kuangqi Zhou, Qixin Zhang, Jie Shao, Bryan Hooi, Jiashi Feng
2021 arXiv   pre-print
Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.  ...  Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping.  ...  Mahadevan (2005) proposes proto-value functions, viewing the Laplacian representations as basis state representations, and use them to approximate value functions.  ... 
arXiv:2107.05545v1 fatcat:7dk7gwow7nhs3o22jdhmcl4gcy

Novel domain formation reveals proto-architecture in inferotemporal cortex

Krishna Srihasam, Justin L Vincent, Margaret S Livingstone
2014 Nature Neuroscience  
We explore the possibility that this proto-organization is retinotopic or shape-based. npg  ...  This indicates that the location of training effects does not depend on function or expertise, but rather on some kind of proto-organization.  ...  The monkeys were trained using a touch screen mounted in their home cage to associate each of the 26 shapes in each set with a particular reward value of 0 to 25 drops of liquid.  ... 
doi:10.1038/nn.3855 pmid:25362472 pmcid:PMC4241119 fatcat:bhciuy6djnfrnllmgghbpx33xm

The Segmentation of Proto-Objects in the Monkey Primary Visual Cortex

Matthew W. Self, Danique Jeurissen, Anne F. van Ham, Bram van Vugt, Jasper Poort, Pieter R. Roelfsema
2019 Current Biology  
These proto-grounds must be correctly assigned to the background to allow correct shape identification and guide behavior.  ...  Suppression of the proto-ground was only present in animals that had been trained to perform the shape-discrimination task, and it predicted the choice of the animal on a trial-by-trial basis.  ...  Otherwise, responses from the same electrode were reliable across days as judged 801 by similar SNRDAY values and the general shape of the response on each day.  ... 
doi:10.1016/j.cub.2019.02.016 pmid:30853432 fatcat:s5fz433lq5acbk2xskdpouqi7i

Spike Synchrony Reveals Emergence of Proto-Objects in Visual Cortex

A. B. Martin, R. von der Heydt
2015 Journal of Neuroscience  
Thus, our results suggest a novel coding mechanism that might underlie the proto-objects of perception.  ...  We recorded from neurons in macaque visual cortex and used border-ownership selectivity, an intrinsic property of the neurons, to infer whether or not two neurons are part of the same grouping circuit.  ...  In support of this view we have shown that the proto-object map covaries with the performance in the shape discrimination task used in this study.  ... 
doi:10.1523/jneurosci.3590-14.2015 pmid:25926461 pmcid:PMC4412900 fatcat:h4qyrxsywzf77e3gp3ltgab3vq
« Previous Showing results 1 — 15 out of 7,160 results