Filters








43,447 Hits in 6.1 sec

Reducing Sampling Error in Batch Temporal Difference Learning [article]

Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
2020 arXiv   pre-print
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning.  ...  To address this limitation, we introduce policy sampling error corrected-TD(0) (PSEC-TD(0)).  ...  Reducing Sampling Error in Batch Temporal Difference Learning The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity  ... 
arXiv:2008.06738v1 fatcat:qpyg7ke7djgwjojoct2rierqou

Reducing sampling error in batch temporal difference learning [article]

Brahma Suneil Pavse, Austin, The University Of Texas At, Stone, Peter, 1971-
2021
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning.  ...  To address this limitation, we introduce policy sampling error corrected-TD(0) (PSEC-TD(0)).  ...  Acknowledgments (1988) and shows that batch linear TD(0) converges to the certainty-equivalence estimate in the per-step reward and discounted settings in a Markov reward process (MRP) and Markov decision  ... 
doi:10.26153/tsw/14853 fatcat:r37puwzrvfc7pexbddotlzhh4q

Reducing Sampling Error in Batch Temporal Difference Learning [article]

Brahma S. Pavse, Ishan Durugkar, Josiah P. Hanna, Peter Stone, Austin, The University Of Texas At
2020
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning.  ...  To address this limitation, we introduce policy sampling error corrected-TD(0) (PSEC-TD(0)).  ...  This work has taken place in the Learning Agents Research Group (LARG) at the Artificial Intelligence Laboratory, The University of Texas at Austin.  ... 
doi:10.26153/tsw/9861 fatcat:avwhwwxr7fc4pmvu534ozfixxi

Reanalysis of Variance Reduced Temporal Difference Learning [article]

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang
2020 arXiv   pre-print
Furthermore, the variance error (for both i.i.d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison  ...  Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance.  ...  In Reinforcement learning, pages 45-73. Springer. Lee, D. and He, N. (2019). Target-based temporal difference learning. In International Conference on Machine Learning (ICML).  ... 
arXiv:2001.01898v2 fatcat:iqmd55jvgbcwzbn5uq2oezbl5a

Dynamical systems as a level of cognitive analysis of multi-agent learning

Wolfram Barfuss
2021 Neural computing & applications (Print)  
I then propose an on-line sample-batch temporal-difference algorithm which is characterized by the combination of applying a memory-batch and separated state-action value estimation.  ...  I demonstrate the usefulness of this framework with the general and widespread class of temporal-difference reinforcement learning.  ...  Note that this sample-batch algorithm reduces to standard Q-learning for a batch size K ¼ 1.  ... 
doi:10.1007/s00521-021-06117-0 pmid:35221541 pmcid:PMC8827307 fatcat:y3oxx2kglvfpjd5onhriky5g44

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay [article]

Dogan C. Cicek, Enes Duran, Baturay Saglam, Furkan B. Mutlu, Suleyman S. Kozat
2021 arXiv   pre-print
Moreover, to reduce the off-policyness of the updates, our algorithm selects one batch among a certain number of batches and forces the agent to learn through the batch that is most likely generated by  ...  In prior works, the sampling probability of the transitions was adjusted according to their importance.  ...  For instance, PER uses Temporal Difference error for that purpose [13] .  ... 
arXiv:2111.01865v2 fatcat:cndqpix3bbhuniy5aidurpjd6q

Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

Doyup Lee
2017 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)  
Experiment results show effectiveness of proposed model, especially, batch temporal normalization layer.  ...  Reconstruction error from deep autoencoder and statistical process control approach are applied to detect time period with anomalies.  ...  Difference between Batch Normalization and Batch Temporal Normalization (Sample Anomaly Score) ij = 1 T T t=1 (ŷ i,j,t − y i,j,t ) 2 (2) where i,j is sample and feature index. IV. EXPERIMENTS A.  ... 
doi:10.1109/icmla.2017.0-126 dblp:conf/icmla/Lee17 fatcat:lklnkwojtnhn3jco2ym2kwjake

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis [article]

Dominik Rivoir, Isabel Funke, Stefanie Speidel
2022 arXiv   pre-print
Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequential modeling, and has led to the use of alternatives in  ...  We argue that BN's properties create major obstacles for training CNNs and temporal models end to end in video tasks.  ...  German Research Foundation (DFG, Deutsche Forschungsgemeinschaft) as part of Germany's Excellence Strategy -EXC 2050/1 -Project ID 390696704 -Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop  ... 
arXiv:2203.07976v1 fatcat:stvkuxdhwbedne5j6wm62ohhr4

Dynamic Mode Decomposition Analysis of Spatially Agglomerated Flow Databases

Binghua Li, Jesús Garicano-Mena, Yao Zheng, Eusebio Valero
2020 Energies  
On the contrary, Mini-batch K-means arises as the method of choice whenever high agglomeration n p ˜ / n p ≪ 1 is possible.  ...  We compare twelve different clustering algorithms on three testcases, encompassing different flow regimes: a synthetic flow field, a R e D = 60 flow around a cylinder cross section, and a R e τ ≈ 200 turbulent  ...  Quadrio (Politecnico di Milano) for providing the DNS solver used in Section 3.3.  ... 
doi:10.3390/en13092134 fatcat:klfsrelolvbytcemzd4jvv5nom

Mini-batch sample selection strategies for deep learning based speech recognition

Yesim Dokuz, Zekeriya Tufekci
2021 Applied Acoustics  
The experiments show that proposed strategies perform better in comparison with standard mini-batch sample selection strategy.  ...  For this purpose, gender and accent adjusted strategies are proposed for selecting mini-batch samples.  ...  Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper  ... 
doi:10.1016/j.apacoust.2020.107573 fatcat:fml6wyg345a3jh67ze7uwxe26y

Learning from Data with Noisy Labels Using Temporal Self-Ensemble [article]

Jun Ho Lee, Jae Soon Baik, Tae Hwan Hwang, Jun Won Choi
2022 arXiv   pre-print
During training, the proposed method generates temporal self-ensemble by sampling intermediate network parameters from the weight trajectory formed by stochastic gradient descent optimization.  ...  By combining the aforementioned metrics, we present the proposed self-ensemble-based robust training (SRT) method, which can filter the samples with noisy labels to reduce their influence on training.  ...  Initialized with the different weights, two networks can be co-trained in a decoupled manner, thereby reducing the effect of memorization.  ... 
arXiv:2207.10354v1 fatcat:4tlb2rtknfhrjfzzkf2yx4flh4

Weight perturbation learning outperforms node perturbation on broad classes of temporally extended tasks [article]

Paul Züge, Christian Klos, Raoul-Martin Memmesheimer
2021 bioRxiv   pre-print
gathering batches of subtasks in a trial decreases the number of trials required.  ...  Further we find qualitative features of the weight and error dynamics that allow to distinguish which of the rules underlie a learning process: in WP, but not NP, weights mediating zero input diffuse and  ...  When single trials capture only a small part of the full task, this slows down WP learning. Training in batches reduces the disadvantage.  ... 
doi:10.1101/2021.10.04.463055 fatcat:ll5nxzgllvgvjbsbtdtltmbguq

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos [article]

Ji Lin, Chuang Gan, Song Han
2019 arXiv   pre-print
In this paper, we study the factors that impact the training scalability of video networks.  ...  With these guidelines, we designed a new operator Temporal Shift Module (TSM) that is efficient and scalable for distributed training.  ...  The initial learning rate is set to 0.00125 for every 8 samples, and we apply linear scaling rule [5] to scale up the learning rate with larger batch size. The total learning rate is η = 0.00125k.  ... 
arXiv:1910.00932v2 fatcat:fy6mmiospbbbnozoj5kpx4nmlu

Predicting Length of Stay in the Intensive Care Unit with Temporal Pointwise Convolutional Networks [article]

Emma Rocheteau, Pietro Liò, Stephanie Hyland
2020 arXiv   pre-print
In this work, we propose a new deep learning model based on the combination of temporal convolution and pointwise (1x1) convolution, to solve the length of stay prediction task on the eICU critical care  ...  The model - which we refer to as Temporal Pointwise Convolution (TPC) - is specifically designed to mitigate for common challenges with Electronic Health Records, such as skewness, irregular sampling and  ...  We would also like to thank Louis-Pascal Xhonneux, Cătălina Cangea and Nikola Simidjievski for their help in reviewing the manuscript.  ... 
arXiv:2006.16109v2 fatcat:kzfpdmqi65cfvdsky5gpezmmei

Machine-learning-based reduced order modeling for unsteady flows around bluff bodies of various shapes [article]

Kazuto Hasegawa, Kai Fukami, Takaaki Murata, Koji Fukagata
2020 arXiv   pre-print
We propose a method to construct a reduced order model with machine learning for unsteady flows.  ...  The present machine-learned reduced order model (ML-ROM) is constructed by combining a convolutional neural network autoencoder (CNN-AE) and a long short-term memory (LSTM), which are trained in a sequential  ...  Machine-learning based reduced order model (ML-ROM) As illustrated in figure 10 , the proposed machine-learning based reduced order model (ML-ROM) is a combination of the MS-CNN-AE model and the LSTM  ... 
arXiv:2003.07548v1 fatcat:uyoit4kzrbfznhut7ssoios5hi
« Previous Showing results 1 — 15 out of 43,447 results