Filters








853 Hits in 4.7 sec

The world of independent learners is not markovian

Guillaume J. Laurent, Laëtitia Matignon, N. Le Fort-Piat
<span title="2011-03-23">2011</span> <i title="IOS Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/lcmcyimopbhjnh4mr7ku53s7cq" style="color: black;">Journal of Knowledge-based &amp; Intelligent Engineering Systems</a> </i> &nbsp;
New concepts are introduced like the divergent learning paths and the observability of the effects of others' actions. To illustrate the formal concepts, a case study is also presented.  ...  In multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violating the property that traditional single-agent learning methods  ...  On the one hand, if there is no coupling between the actions then both agents will evolve independently in Markovian worlds.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3233/kes-2010-0206">doi:10.3233/kes-2010-0206</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/frzqzbp43zcwrjswpz7tvlibm4">fatcat:frzqzbp43zcwrjswpz7tvlibm4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190315144533/https://core.ac.uk/download/pdf/54039321.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/4d/1e/4d1ef6a49cb2973e01acf34dbc6e43248db88bdb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3233/kes-2010-0206"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning [article]

Borja G. León, Francesco Belardinelli
<span title="2020-02-14">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The combination of Formal Methods with Reinforcement Learning (RL) has recently attracted interest as a way for single-agent RL to learn multiple-task specifications.  ...  Specifically, we use our model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non-Markovian co-safe LTL specifications.  ...  Then, an independent DQN is initialized by each agent for each of these tasks. Once done, the curriculum learner selects the next specification to be solved by the agents in the environment.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.06000v1">arXiv:2002.06000v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3om52suwurgabklqm555xbqxcu">fatcat:3om52suwurgabklqm555xbqxcu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321180816/https://arxiv.org/pdf/2002.06000v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.06000v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Low Complexity Proto-Value Function Learning from Sensory Observations with Incremental Slow Feature Analysis [chapter]

Matthew Luciw, Juergen Schmidhuber
<span title="">2012</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
The algorithm is local in space and time, furthering the biological plausibility and applicability of PVFs.  ...  A temporaldifference based reinforcement learner improves a value function approximation upon the features, and the agent uses the value function to achieve rewards successfully.  ...  This work was funded by Swiss National Science Foundation grant CRSIKO-122697 (Sinergia project), and through the 7th framework program of the EU under grant #270247 (NeuralDynamics project).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-33266-1_35">doi:10.1007/978-3-642-33266-1_35</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ozo3u3kzcfe4tpvjrtzge2fdsq">fatcat:ozo3u3kzcfe4tpvjrtzge2fdsq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170809105402/http://people.idsia.ch/~luciw/papers/icann12-luciw.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2c/c1/2cc1b1b105eb4c2e14bededf933185bd4e16896d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-33266-1_35"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Understanding, evaluating, and supporting self-regulated learning using learning analytics

Ido Roll, Philip H. Winne
<span title="">2015</span> <i title="Society for Learning Analytics Research"> Journal of Learning Analytics </i> &nbsp;
This special section highlights the current state of research at the intersection of self-regulated learning and learning analytics, bridging communities, disciplines, and schools of thought.  ...  Self-regulated learning is an ongoing process rather than a single snapshot in time.  ...  A genuinely social exchange consists of successive turns are not independent (not Markovian).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18608/jla.2015.21.2">doi:10.18608/jla.2015.21.2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cshjthgajrdgpiz2wyqdwnf25q">fatcat:cshjthgajrdgpiz2wyqdwnf25q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170921231949/http://epress.lib.uts.edu.au/journals/index.php/JLA/article/download/4491/4825" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/af/55/af551556877704cb2282d6beac13d41a27c6b9ee.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18608/jla.2015.21.2"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Long-term Planning by Short-term Prediction [article]

Shai Shalev-Shwartz and Nir Ben-Zrihem and Aviad Cohen and Amnon Shashua
<span title="2016-02-04">2016</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge.  ...  We argue that dual versions of the MDP framework (that depend on the value function and the Q function) are problematic for autonomous driving applications due to the non Markovian of the natural state  ...  We demonstrated the effectiveness of the learning procedure for two simple tasks: adaptive cruise control and roundabout merging.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1602.01580v1">arXiv:1602.01580v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5zmnm4eixrhvrezxy3ekepv5ay">fatcat:5zmnm4eixrhvrezxy3ekepv5ay</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906223657/https://arxiv.org/pdf/1602.01580v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ee/fc/eefc35b5df3bfd61a1d88fcc73bbeac29999cc7c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1602.01580v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Reinforcement-based Robotic Memory Controller [chapter]

Osman Hassab
<span title="2010-08-12">2010</span> <i title="Sciyo"> Robot Learning </i> &nbsp;
In (a) a partially observable world, in which the agent does not know which state it is in due to sensor limitations; for the value function v π , the agent updates its policy parameters directly.  ...  Our aim is not to mimic the neuroanatomical structure of the brain system but to catch its properties, avoids manual 'hard coding' of behaviors.  ...  Pole-balancing learning parameters Below are the equations and parameters used for cart-pole balancing experiments (31) A.1 Pole-balancing equations The equations of motion for N unjoined poles balanced  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5772/10252">doi:10.5772/10252</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wq5qgwztdjcn3jobnrfb526nhm">fatcat:wq5qgwztdjcn3jobnrfb526nhm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190504164749/https://cdn.intechopen.com/pdfs/12135.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b0/dd/b0dd980ea229efa4ca8d49b2dcb7a9fae4d5e8d8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5772/10252"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Quantum Markovianity as a supervised learning task

Sally Shrapnel, Fabio Costa, Gerard Milburn
<span title="">2018</span> <i title="World Scientific Pub Co Pte Lt"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ozpwqwobnbh3jcc4pclha67lny" style="color: black;">International Journal of Quantum Information</a> </i> &nbsp;
In this paper we investigate the possibility of using supervised learning to estimate the dimension of a non-Markovian quantum environment.  ...  Our approach uses an ensemble learning method, the Random Forest Regressor, applied to classically simulated data sets. Our results indicate this is a promising line of research.  ...  Extremely Randomised Trees are an example of "weak" learners: although each individual tree may not fit the data very well, the averaged ensemble provides a good fit to the data that is likely to generalise  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1142/s0219749918400105">doi:10.1142/s0219749918400105</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/y7bzdfrm3fh77n5dsef5hwixua">fatcat:y7bzdfrm3fh77n5dsef5hwixua</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200909021902/https://arxiv.org/pdf/1901.05158v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b3/cf/b3cff24b1367dc5fd3a09f90bc7aaade5bddfba3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1142/s0219749918400105"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> worldscientific.com </button> </a>

Deep Abstract Q-Networks [article]

Melrose Roderick, Christopher Grimm, Stefanie Tellex
<span title="2018-08-25">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains.  ...  We construct toy domains that elucidate the problem of long horizons, sparse rewards and high-dimensional inputs, and show that our algorithm significantly outperforms previous methods on these domains  ...  It is crucial that the abstraction is as close to Markovian as possible: the transition dynamics for a state should not depend on the history of previous states.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.00459v2">arXiv:1710.00459v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/oprqohkzxrdxnmyiq7ny6y57cu">fatcat:oprqohkzxrdxnmyiq7ny6y57cu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200929073338/https://arxiv.org/pdf/1710.00459v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/bd/b7/bdb783dae101fab92c31787687cbbee7c3160462.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1710.00459v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Multi-agent Relational Reinforcement Learning [chapter]

Tom Croonenborghs, Karl Tuyls, Jan Ramon, Maurice Bruynooghe
<span title="">2006</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
There is growing evidence in the Reinforcement Learning research community that a relational representation of the state space has many benefits over a propositional one.  ...  In this paper we explore the powerful possibilities of using Relational Reinforcement Learning (RRL) in complex multi-agent coordination tasks.  ...  In the local or selfish Q-learners setting, the presence of the other agents is totally neglected, and agents are considered to be selfish reinforcement learners.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11691839_12">doi:10.1007/11691839_12</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/q6hfxggsinhjzk254b5crs3oxe">fatcat:q6hfxggsinhjzk254b5crs3oxe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20171127065354/https://core.ac.uk/download/pdf/34329373.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e2/4c/e24cefc3e37a7bbdbd46b1f8830137a5cb7c5bac.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11691839_12"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Spatio-Temporal Credit Assignment in Neuronal Population Learning

Johannes Friedrich, Robert Urbanczik, Walter Senn, Boris S. Gutkin
<span title="2011-06-30">2011</span> <i title="Public Library of Science (PLoS)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ch57atmlprauhhbqdf7x4ytejm" style="color: black;">PLoS Computational Biology</a> </i> &nbsp;
When considering the biophysical basis of learning, the credit-assignment problem is compounded because the behavioral decisions themselves result from the spatio-temporal aggregation of many synaptic  ...  We present a model of plasticity induction for reinforcement learning in a population of leaky integrate and fire neurons which is based on a cascade of synaptic memory traces.  ...  Acknowledgments We thank Michael Herzog and Thomas Nevian for helpful discussions on the learning task paradigms and on possible molecular implementations of the synaptic plasticity rule.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1371/journal.pcbi.1002092">doi:10.1371/journal.pcbi.1002092</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/21738460">pmid:21738460</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC3127803/">pmcid:PMC3127803</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/di4pdtkekzehfewxakclaxivwi">fatcat:di4pdtkekzehfewxakclaxivwi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190227092506/http://pdfs.semanticscholar.org/7ddd/dd143c176d026cbc88fcec2916303879700b.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7d/dd/7ddddd143c176d026cbc88fcec2916303879700b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1371/journal.pcbi.1002092"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> plos.org </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3127803" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>

Time-varying Learning and Content Analytics via Sparse Factor Analysis [article]

Andrew S. Lan, Christoph Studer, Richard G. Baraniuk
<span title="2013-12-19">2013</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content  ...  intrinsic difficulty of the assessment questions.  ...  This work was supported by the National Science Foundation under Cyberlearning grant IIS-1124535, the Air Force Office of Scientific Research under grant FA9550-09-1-0432, and the Google Faculty Research  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1312.5734v1">arXiv:1312.5734v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ndfyqgbxozdqtffkkk76ww2fbu">fatcat:ndfyqgbxozdqtffkkk76ww2fbu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191016175154/https://arxiv.org/pdf/1312.5734v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/31/93/3193d6147ade58e8cc6ed09c4864b23b3cd1c496.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1312.5734v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Matthew E. Taylor, Shimon Whiteson, Peter Stone
<span title="">2006</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/fdhfwmjdwjbvxo6zc7cdt5hi7q" style="color: black;">Proceedings of the 8th annual conference on Genetic and evolutionary computation - GECCO &#39;06</a> </i> &nbsp;
Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic.  ...  Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.  ...  Acknowledgments We would like to thank Ken Stanley for help setting up NEAT in Keepaway, as well as Nate Kohl, David Pardoe, Joeseph Reisinger, Jefferson Provost, and the anonymous reviewers for helpful  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1143997.1144202">doi:10.1145/1143997.1144202</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/gecco/TaylorWS06.html">dblp:conf/gecco/TaylorWS06</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wnpic3djnjclnnr3smgfqlvwaa">fatcat:wnpic3djnjclnnr3smgfqlvwaa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170809113611/http://www.cs.bham.ac.uk/~wbl/biblio/gecco2006/docs/p1321.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c4/f9/c4f9ce6a755dc0b7371ba387ca80255c25a43808.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1143997.1144202"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
<span title="">1994</span> <i title="MIT Press - Journals"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/rckx6fqoszfvva5c53bqivu5am" style="color: black;">Neural Computation</a> </i> &nbsp;
Moreover, because most real world problems involving prediction of the future consequences of actions involve substantial uncertainty, the learner must be prepared to make use of a probability calculus  ...  Here we have also used the fact that Vý\(i) is a contraction mapping independent of possible discounting.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1162/neco.1994.6.6.1185">doi:10.1162/neco.1994.6.6.1185</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/db7tg7xngzhhtlgvawtk6oo2qu">fatcat:db7tg7xngzhhtlgvawtk6oo2qu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170922064157/http://www.dtic.mil/get-tr-doc/pdf?AD=ADA276517" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8d/b4/8db4349998977d1353328380464b897e2dfe349b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1162/neco.1994.6.6.1185"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> mitpressjournals.org </button> </a>

Multi-agent deep reinforcement learning: a survey

Sven Gronauer, Klaus Diepold
<span title="2021-04-15">2021</span> <i title="Springer Science and Business Media LLC"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/srtrvzrec5a3fmhgvgc3676jlu" style="color: black;">Artificial Intelligence Review</a> </i> &nbsp;
problems with real-world complexity.  ...  This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning.  ...  Acknowledgements We would like to thank the editor and the three anonymous reviewers for providing  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10462-021-09996-w">doi:10.1007/s10462-021-09996-w</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/blu4ekwaxjfo5it3y7taqnzq4a">fatcat:blu4ekwaxjfo5it3y7taqnzq4a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210718010404/https://link.springer.com/content/pdf/10.1007/s10462-021-09996-w.pdf?error=cookies_not_supported&amp;code=ef15ac3e-ba2a-4210-ad74-79acd8c1e2d0" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/21/04/21041421fbd48ba4835bf8d1de0ab3f87db23376.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s10462-021-09996-w"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>

Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots

Hung Ngo, Matthew Luciw, Alexander Förster, Jürgen Schmidhuber
<span title="">2013</span> <i title="Frontiers Media SA"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/5r5ojcju2repjbmmjeu5oyawti" style="color: black;">Frontiers in Psychology</a> </i> &nbsp;
The desired setting is a self-generated goal, and the plan of action, essentially a program to solve a problem, is a skill.  ...  For validation, this method is applied to both a simulated and real Katana robot arm in its "blocks-world" environment.  ...  ACKNOWLEDGMENTS We would like to thank the reviewers for their very useful comments that helped improve this paper.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3389/fpsyg.2013.00833">doi:10.3389/fpsyg.2013.00833</a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pubmed/24324448">pmid:24324448</a> <a target="_blank" rel="external noopener" href="https://pubmed.ncbi.nlm.nih.gov/PMC3840616/">pmcid:PMC3840616</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/f7amrbrrubd5jctkdjlmvayyw4">fatcat:f7amrbrrubd5jctkdjlmvayyw4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829090606/https://fjfsdata01prod.blob.core.windows.net/articles/files/62546/pubmed-zip/.versions/1/.package-entries/fpsyg-04-00833/fpsyg-04-00833.pdf?sv=2015-12-11&amp;sr=b&amp;sig=NfFK4ZqjKW5MKH39EIm7%2Be6W%2BAYRUmjfGpQnHYudvaU%3D&amp;se=2017-08-29T09%3A06%3A16Z&amp;sp=r&amp;rscd=attachment%3B%20filename%2A%3DUTF-8%27%27fpsyg-04-00833.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f2/c9/f2c9d52426221ca2ebdfa166c6974c5af8d13169.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3389/fpsyg.2013.00833"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> frontiersin.org </button> </a> <a target="_blank" rel="external noopener" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840616" title="pubmed link"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> pubmed.gov </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 853 results