63,552 Hits in 3.4 sec

Analysis of a Classification-based Policy Iteration Algorithm

Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos
2010 International Conference on Machine Learning  
We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis.  ...  The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting.  ...  In this paper, we derive a full finite-sample analysis of a classification-based API algorithm (called direct policy iteration (DPI)) based on a cost-sensitive loss function weighing each classification  ... 
dblp:conf/icml/LazaricGM10 fatcat:lkahsrgbsfefjmuvxevxajocd4

Conservative and Greedy Approaches to Classification-Based Policy Iteration

Mohammad Ghavamzadeh, Alessandro Lazaric
The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: direct policy iteration (DPI) methods that directly assign the output of the classifier (the approximate  ...  ~the current policy) to the next policy, and conservative policy iteration (CPI) methods in which the new policy is a mixture distribution of the current policy and the output of the classifier.  ...  Acknowledgments This work was supported by French National Research Agency (ANR) through the project LAMPADA n • ANR-09-EMER-007, by Ministry of Higher Education and Research, Nord-Pas de Calais Regional  ... 
doi:10.1609/aaai.v26i1.8304 fatcat:c6wcd2feqnaixdju5ttxttjycy

An Optimized K-means Algorithm for Text Clustering

Jiani Zhao
2021 Converter  
and category topics, but also provides an approach for the objective classification of Chinese entrepreneurial policy text collections in the meanwhile.  ...  The improved K-means algorithm is applied to the clustering analysis of the Chinese entrepreneurial policy text collection, and the clustered topic effects are visually displayed through the word cloud  ...  Based on the above analysis, the research makes improvement to the above-mentioned problems on the basis of the traditional K-means algorithm.  ... 
doi:10.17762/converter.85 fatcat:lrk5jzfhk5clvioiygaiv5byfa

Data Analysis Using Rough Set Theory and Q-Learning Algorithm

Marwa Ramadan Salih, Yasser Fouad Hassan, Ashraf Elsayed
2019 ICIC Express Letters  
The reinforcement learning based on Q-learning algorithm is used for classification of a dataset. The reducts will be used as actions in our reinforcement framework.  ...  This paper introduces methodology for medical data analysis by using both rough set theory and reinforcement learning.  ...  This paper presents the medical data analysis approach based on the rough set theory as an attribute reduction technique and the Q-learning approach for classification of a dataset.  ... 
doi:10.24507/icicel.13.04.269 fatcat:5lpk5zwhwjfylg5bsqjk4srmuy

Classification-based Approximate Policy Iteration: Experiments and Extended Discussions [article]

Amir-massoud Farahmand, Doina Precup, André M.S. Barreto, Mohammad Ghavamzadeh
2014 arXiv   pre-print
We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the  ...  We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms (including temporal-difference-style  ...  ACKNOWLEDGMENT This work is financially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).  ... 
arXiv:1407.0449v1 fatcat:lsnn4di2ongvbpk34v6nxkgsly

Reinforcement and Imitation Learning via Interactive No-Regret Learning [article]

Stephane Ross, J. Andrew Bagnell
2014 arXiv   pre-print
The results provide theoretical support to the commonly observed successes of online approximate policy iteration.  ...  Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning.  ...  Analysis Consider the loss function L n given to the online learning algorithm within NRPI at iteration n.  ... 
arXiv:1406.5979v1 fatcat:34d475u3v5fltkwmayv2fgubbq

Agnostic System Identification for Model-Based Reinforcement Learning [article]

Stephane Ross, J. Andrew Bagnell
2012 arXiv   pre-print
In particular, we show that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution  ...  We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class.  ...  The policies π 1:N are s.t. for any policy π : rgt , for KL rgt the average re- gret of the algorithm after N iterations, s.t. KL rgt → 0 as N → ∞.  ... 
arXiv:1203.1007v2 fatcat:5yafddw7yzf4xajrgliawb4lka

Agnostic System Identification for Model-Based Reinforcement Learning

Stéphane Ross, Drew Bagnell
2012 International Conference on Machine Learning  
In particular, we show that any no-regret online learning algorithm can be used to obtain a nearoptimal policy, provided some model achieves low training error and access to a good exploration distribution  ...  We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class.  ...  , or running a base policy we want to improve upon).  ... 
dblp:conf/icml/RossB12 fatcat:b4uoau57dfdk5a2zab3uzsakvm

A Comparative Study of Meta-Heuristic and Conventional Search in Optimization of Multi-Dimensional Feature Selection

2022 International Journal of Applied Metaheuristic Computing  
Algorithmicbased search approach is ineffective at addressing the problem of multi-dimensional feature selection for document categorization.  ...  In addition, the selected number of feature subsets were minimized dramatically for document classification.  ...  ACKNOwLeDGMeNT This work was supported by the Department of International Business Management, Didyasarin International College, Hatyai University. The authors wish to thank Dr Ozioma F.  ... 
doi:10.4018/ijamc.292517 fatcat:zkt2gzbvprfcza7a7hppiywfnu

Efficient Reductions for Imitation Learning

Stéphane Ross, Drew Bagnell
2010 Journal of machine learning research  
We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction.  ...  These two approaches share in common that the learner's policy is slowly modified from executing the expert's policy to the learned policy.  ...  and by the National Sciences and Engineering Research Council of Canada (NSERC).  ... 
dblp:journals/jmlr/RossB10 fatcat:l4log5mvcrdtflvzmlsixk3qe4

Classification-based Policy Iteration with a Critic

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer
2011 International Conference on Machine Learning  
In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms.  ...  We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method.  ...  This work was supported by Ministry of Higher Education and Research, Nord-Pas de Calais Regional Council and FEDER through the "contrat de projets état region 2007-2013", and by PASCAL2 Network of Excellence  ... 
dblp:conf/icml/GabillonLGS11 fatcat:tosuhv4rsbax3dvr4rhz22vi5a

Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey

De-Rong Liu, Hong-Liang Li, Ding Wang
2015 International Journal of Automation and Computing  
policy given a fixed amount of training data.  ...  Finally, we derive some future directions in the research of RL algorithms, theories and applications.  ...  [34] provided an error propagation analysis for approximate modified policy iteration and established finite-sample error bounds in weighted Lp norms for classification-based approximate modified policy  ... 
doi:10.1007/s11633-015-0893-y fatcat:53wepnwplfgtjkqlt2j4q2dtuy

A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Abhijit Gosavi
2004 Machine Learning  
In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared.  ...  We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems.  ...  Figure 1 . 1 A policy iteration based RL algorithm for computing average reward optimal policies for MDPs and SMDPs.  ... 
doi:10.1023/b:mach.0000019802.64038.6c fatcat:3jjyyzh5urbnjlxmyfp4vv4pma

Data-based control, optimization, modeling and applications

Dongbin Zhao, Yi Shen, Zhanshan Wang, Xiaolin Hu
2013 Neural computing & applications (Print)  
We are very grateful for the hard work of the reviewers, which help to facilitate the outcome of this special issue with greatly improved quality of the contributions.  ...  We are honored to organize this special issue of Neural Evaluated by the contributions and the recommendation of ISNN 2012 organizers, 22 papers were selected for the further review process with their  ...  Derong Liu et al. develop an online algorithm based on policy iteration for continuous-time optimal control with infinite horizon cost for nonlinear systems.  ... 
doi:10.1007/s00521-012-1319-1 fatcat:7feupzivurbjhjbhgcck3ql6by

Application Research on Data Mining Algorithm in Intrusion Detection System

W.Z. Wu, L.Q. Liu, B. Xu
2016 Chemical Engineering Transactions  
Then, in order to solve the problem that the detection result is affected by the initial clustering centre and number setting, we propose a k-means clustering algorithm based on genetic algorithm.  ...  Data mining is a data analysis and processing technology which is a kind of widely used.  ...  Next, we calculate the final classification based on the k-means algorithm.  ... 
doi:10.3303/cet1651102 doaj:53951e7a279541a6862e18a1e2efea37 fatcat:6asb2loib5cuxnqi7onm6uibui
« Previous Showing results 1 — 15 out of 63,552 results