Filters








2,988 Hits in 6.6 sec

Lambda-Policy Iteration: A Review and a New Implementation [article]

Dimitri P. Bertsekas
2015 arXiv   pre-print
In this paper we discuss ł-policy iteration, a method for exact and approximate dynamic programming.  ...  One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory.  ...  a new policy is generated).  ... 
arXiv:1507.01029v1 fatcat:n4zkf7f7s5dzha56dkn2orcw44

Lambda-Policy Iteration: A Review and a New Implementation [chapter]

Dimitri P. Bertsekas
2013 Reinforcement Learning and Approximate Dynamic Programming for Feedback Control  
The improved policyμ is evaluated by solving the linear system of equations Jμ = TμJμ, and (Jμ,μ) becomes the new cost vector-policy pair, which is used to start a new iteration.  ...  a new policy is generated).  ...  LAMBDA-POLICY ITERATION WITHOUT COST FUNCTION APPROXIMATION We first recall a central result from [BeI96] .  ... 
doi:10.1002/9781118453988.ch17 fatcat:t7jg55t5gfe33je2bmmucveisu

Lambda-Policy Iteration with Randomization for Contractive Models with Infinite Policies: Well-Posedness and Convergence (Extended Version) [article]

Yuchao Li, Karl H. Johansson, Jonas Mårtensson
2020 arXiv   pre-print
Guided by the analysis, we exemplify a data-driven approximated implementation of the algorithm for estimation of optimal costs of constrained linear and nonlinear control problems.  ...  Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the λ-operator plays a central role in the algorithm.  ...  The helpful comments from the reviewers are also acknowledged.  ... 
arXiv:1912.08504v3 fatcat:nx3grrxajbh7lh6h7xwg2y2k7m

Learning to Play No-Press Diplomacy with Best Response Policy Iteration [article]

Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh (+2 others)
2022 arXiv   pre-print
We also introduce a family of policy iteration methods that approximate fictitious play.  ...  With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process  ...  We thank Edward Hughes and David Balduzzi for their advice on the project. We thank Kestas Kuliukas for providing the dataset of human diplomacy games.  ... 
arXiv:2006.04635v4 fatcat:sdpmfob6hjg7tono55f23zykdy

Polymorphic Iterable Sequential Effect Systems [article]

Colin S. Gordon
2020 arXiv   pre-print
Understanding such systems in terms of a semilattice of effects grounds understanding of the essential issues, and provides guidance when designing new effect systems.  ...  We show that for most effect quantales, there is an induced general notion of iterating a sequential effect; that for systems we consider the derived iteration agrees with the manually designed iteration  ...  ACKNOWLEDGMENTS We thank the anonymous TOPLAS reviewers for remarkably careful, thorough, and constructive feedback on both presentation and technical developments in earlier drafts of this paper.  ... 
arXiv:1808.02010v5 fatcat:bkel27dtkffujjumvlnfkcwway

Efficient Exhaustive Generation of Functional Programs Using Monte-Carlo Search with Iterative Deepening [chapter]

Susumu Katayama
2008 Lecture Notes in Computer Science  
[a], (:) :: ∀a.a → [a] → [a], f oldr :: ∀ab.b → (a → b → b) → [a] → b ; ∀a. [[a]]  ...  Recently, in addition to them, some researchers pursue efficient exhaustive program generation algorithms, partly for the purpose of providing a comparator and knowing how essential the ideas such as heuristics  ...  We implemented a function that takes such a prioritized bag as an argument and returns its complete set of representatives as a prioritized infinite set of functions.  ... 
doi:10.1007/978-3-540-89197-0_21 fatcat:z3g7wirqfbdh3fh3br4w7nb7m4

GPU Implementation of Iterative-Constrained Endmember Extraction from Remotely Sensed Hyperspectral Images

Eysteinn Mar Sigurdsson, Antonio Plaza, Jon Atli Benediktsson
2015 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
In this paper, a complete and scalable adaptation of the ICE algorithm is implemented using the parallel nature of commodity graphics processing units (GPUs).  ...  The iterated constrained endmembers (ICE) algorithm is an iterative method that uses the linear model to extract endmembers and abundances simultaneously from the data set.  ...  All the required kernel programs are elaborated on and a complete parallel implementation is given in pseudocode. A.  ... 
doi:10.1109/jstars.2015.2441699 fatcat:x4qfztr25ncold64g3zcsc6v5q

Multi-echo MR thermometry using iterative separation of baseline water and fat images

Megan E. Poorman, Ieva Braškutė, Lambertus W. Bartels, William A. Grissom
2018 Magnetic Resonance in Medicine  
This work was supported by DoD W81XWH-13-1-0230, NIH T32EB021937, a Vanderbilt University Central Discovery Grant, and a Whitaker International Program Summer Grant.  ...  ACKNOWLEDGMENT The authors would like to thank Charles Mougenot, Clemens Bos, and Roel Deckers for their experimental support.  ...  | In vivo liver Informed consent was obtained from a healthy female volunteer in accordance with the Vanderbilt University Institutional Review Board policies.  ... 
doi:10.1002/mrm.27567 pmid:30394582 pmcid:PMC6550275 fatcat:u7bb3md4avbgfmqisvkpriuawq

Synthesizing Configuration Tactics for Exercising Hidden Options in Serverless Systems [chapter]

Jörn Kuhlenkamp, Sebastian Werner, Chin Hong Tran, Stefan Tai
2022 Lecture Notes in Business Information Processing  
Conversely, a poor configuration can have a significant negative impact on the system's performance, reliability, and cost.  ...  A proper configuration of an information system can ensure accuracy and efficiency, among other system objectives.  ...  [22] use reinforcement learning that requires 150-600 iterations to stabilize in a policy. For sampling, each iteration executes 500 requests in flight for 30 seconds.  ... 
doi:10.1007/978-3-031-07481-3_5 fatcat:ea7vx2ijerc4jc6zo2etg5owie

Focused Crawling for Structured Data

Robert Meusel, Peter Mika, Roi Blanco
2014 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14  
We propose new methods of focused crawling specifically designed for collecting data-rich pages with greater efficiency.  ...  We show that these techniques significantly outperform state-of-the-art approaches for focused crawling, measured as the ratio of relevant pages and non-relevant pages collected within a given budget.  ...  We will mainly focus on implementing a novel selection policy, i.e. determining the order in which new URLs are discovered and processed.  ... 
doi:10.1145/2661829.2661902 dblp:conf/cikm/MeuselMB14 fatcat:k7zyz6c5w5bjnezcvompaswtpe

OpenDC Serverless: Design, Evaluation, and Implementation of a FaaS Platform Simulator

Soufiane Jounaid, Alexandru Iosup, Erwyn Van Eyk, Georgios Andreadis
2020 Zenodo  
The simulator exposes custom interfaces for the implementation of resource allocation, management, and scheduling policies. It further supports the modification of its core architectural components.  ...  Despite the growing popularity of FaaS within the research community, evaluating the performance and cost of different resource management, scheduling, and provisioning policies remains a difficult endeavor  ...  Since we are making the code and sample public, the improvements suggested could be implemented in a future iteration of the experiment.  ... 
doi:10.5281/zenodo.4046674 fatcat:5p3maohqhzc6bapixwzmvesyse

Parallel execution of first-order operations

Sina Madani, Dimitrios S. Kolovos, Richard F. Paige
2018 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems  
We present parallel execution algorithms for a range of iteration-based operations in the context of the OCL-inspired Epsilon Object Language.  ...  Although the processing pipeline for models may involve numerous stages such as validation, transformation and code generation, many of these complex processes are (or can be) expressed by a combination  ...  Section 4 reviews twelve iteration-based operations and describes how they can be re-implemented with a parallel execution algorithm.  ... 
dblp:conf/models/MadaniKP18 fatcat:ys5rzhj6trfvzefoveysc2hvcm

Representing Partial Programs with Blended Abstract Semantics [article]

Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B. Tenenbaum, Armando Solar-Lezama
2021 arXiv   pre-print
Here we learn an approximate execution model implemented as a modular neural network.  ...  In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next.  ...  ACKNOWLEDGMENTS The authors gratefully acknowledge Kevin Ellis and Eric Lu for productive conversations. We additionally thank anonymous reviewers for helpful comments. M.  ... 
arXiv:2012.12964v2 fatcat:sw2vlyno3bci3jmz7w362bu6wq

On Determinism [chapter]

Stephen A. Edwards
2018 Lecture Notes in Computer Science  
Determinism can be thought of as an abstraction boundary that delineates where control is passed from a system designer to the implementation.  ...  In particular, the sets E and B are difficult to define because they are meant to represent "everything else," but this requires a careful definition of the universal set, which is not obvious.  ...  Acknowledgements The National Science Foundation funded this work (CCF-1162124); the suggestions of two anonymous reviewers definitely improved this paper.  ... 
doi:10.1007/978-3-319-95246-8_14 fatcat:mzphndpzsfexnads7gxhkejd3a

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

Martijn Onderwater, Sandjai Bhulai, Rob van der Mei
2016 IEEE Transactions on Systems, Man & Cybernetics. Systems  
We give a detailed description of VFD and illustrate its application on an example MDP.  ...  The resulting policy shows near-optimal performance on a wide range of model parameters. Finally, we identify and discuss future application scenarios of discovered value functions.  ...  ACKNOWLEDGMENT We thank SURFsara [21] for the support in using the LISA Compute Cluster, and the reviewers for their in-depth comments during the peer-review process.  ... 
doi:10.1109/tsmc.2015.2475716 fatcat:v62fugyx25ay7iz66cnqxtcy24
« Previous Showing results 1 — 15 out of 2,988 results