A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
[article]
2014
arXiv
pre-print
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training the weights of Restricted Boltzmann Machines. ...
In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than exact sampling, while the mean of subsequent PCD estimates has a higher variance than exact sampling ...
Contrastive Divergence is claimed to benefit from low variance of the gradient estimates when using stochastic gradients. ...
arXiv:1312.6002v3
fatcat:otkjsotvnnbgzjfcdd4seversm
Population-Contrastive-Divergence: Does Consistency help with RBM training?
[article]
2017
arXiv
pre-print
However, when the RBM distribution has many hidden neurons, the consistent estimate of pop-CD may still have a considerable bias and the variance of the gradient estimate requires a smaller learning rate ...
However, the variance of the gradient estimate increases. We experimentally show that pop-CD can significantly outperform CD. ...
ACKNOWLEDGEMENTS Christian and Oswin acknowledge support from the Danish National Advanced Technology Foundation through project "Personalized breast cancer screening". ...
arXiv:1510.01624v4
fatcat:3krjlyohznd3jawzu4ykwjexya
Gamma Markov Random Fields for Audio Source Modeling
2010
IEEE Transactions on Audio, Speech, and Language Processing
In this paper, we optimize the hyperparameters of our GMRF-based audio model using contrastive divergence and compare this method to alternatives such as score matching and pseudolikelihood maximization ...
We present the performance of the GMRF models in denoising and single-channel source separation problems in completely blind scenarios, where all the hyperparameters are jointly estimated given only audio ...
Turner for fruitful discussions and suggestions. ...
doi:10.1109/tasl.2009.2031778
fatcat:valhskewfvd7xjrn2i67m2vpvu
Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models
[article]
2020
arXiv
pre-print
, and lower variance of gradient estimates. ...
In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully ...
Ou is also affiliated with Beijing National Research Center for Information Science and Technology. This work was supported by NSFC Grant 61976122, China MOE-Mobile Grant MCM20170301. ...
arXiv:2005.14001v1
fatcat:x24apzi7svau3fqmphkaxlyypq
Using fast weights to improve persistent contrastive divergence
2009
Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09
, low variance estimate of the sufficient statistics under the model. ...
The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap ...
Acknowledgements We thank Ruslan Salakhutdinov for many useful discussions and suggestions. This research was financially supported by NSERC and Microsoft. ...
doi:10.1145/1553374.1553506
dblp:conf/icml/TielemanH09
fatcat:wb6pi4hicvfu5lybx2jvu22hla
A Practical Guide to Training Restricted Boltzmann Machines
[chapter]
2012
Lecture Notes in Computer Science
Many of my past and present graduate students and postdocs have made valuable contributions to the body of practical knowledge described in this technical report. ...
I have tried to acknowledge particularly valuable contributions in the report, but I cannot always recall who suggested what. ...
A more radical departure from CD 1 is called "persistent contrastive divergence" (Tieleman, 2008) . ...
doi:10.1007/978-3-642-35289-8_32
fatcat:tfonx6lyajbmfbb6sz6cxqi6ou
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling
[article]
2019
arXiv
pre-print
This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence. ...
We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms. ...
In contrast, the REINFORCE φ-gradient estimate only depends on involving ∇ φ logq φ (h|x). Improving φ-gradient estimator beyond simple reparameterization Roeder et al. (2017) . ...
arXiv:1808.01630v4
fatcat:t4ktiiaoqnasteqjyt6t7w6v54
Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy
2015
International journal of pattern recognition and artificial intelligence
This result shows that the proposed method outperforms the method presented in the first paper introducing DBN (1.25% error rate) and general classification methods such as SVM (1.4% error rate) and KNN ...
We argue that these samples can more accurately compute gradient of log probability of training data. According to the results, an error rate of 0.99% was achieved on MNIST test set. ...
Contrastive Divergence or Persistent Contrastive Divergence) is in sampling in their negative phase [15] . To compute < 𝑣 𝑖 ℎ 𝑗 > 𝑚𝑜𝑑𝑒𝑙 , Gibbs sampling method may be used. ...
doi:10.1142/s0218001415510064
fatcat:tvwzhaitujbgxhyzmcpetbja2a
Learning Non-deterministic Representations with Energy-based Ensembles
[article]
2015
arXiv
pre-print
We propose an algorithm similar to contrastive divergence for training restricted Boltzmann stochastic ensembles. ...
Inspired by the stochasticity of the synaptic connections in the brain, we introduce Energy-based Stochastic Ensembles. ...
E.N. and G.C. were supported in part by the National Science Foundation (NSF EFRI-1137279) and the Office of Naval Research (ONR MURI 14-13-1-0205). M.A. was supported by KAUST Graduate Fellowship. ...
arXiv:1412.7272v2
fatcat:4t4emfk52vfdnmflgdww7h3kq4
Justifying and Generalizing Contrastive Divergence
2009
Neural Computation
We show that its residual term converges to zero, justifying the use of a truncation, i.e. running only a short Gibbs chain, which is the main idea behind the Contrastive Divergence (CD) estimator of the ...
We are particularly interested in estimators of the gradient of the log-likelihood obtained through this expansion. ...
If the contrastive divergence update is considered like a biased and noisy estimator of the true loglikelihood gradient, it can be shown that stochastic gradient descent converges (to a local minimum), ...
doi:10.1162/neco.2008.11-07-647
pmid:19018704
fatcat:cmm4n7r65fhmtbmohf2yxnqdfi
Variational Noise-Contrastive Estimation
[article]
2019
arXiv
pre-print
To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models. ...
However, learning their parameters from data is intractable, and few estimation techniques are currently available for such models. ...
Benjamin Rhodes was supported in part by the EPSRC Centre for Doctoral Training in Data Science, funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1) and the University ...
arXiv:1810.08010v3
fatcat:nhrcte3vfraxppnamnd235pwfq
An Efficient Learning Procedure for Deep Boltzmann Machines
2012
Neural Computation
Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. ...
The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden ...
Acknowledgments This research was supported by NSERC and by gifts from Google and Microsoft. ...
doi:10.1162/neco_a_00311
pmid:22509963
fatcat:fqz2deyygjglzk4wquyfwvcw5y
Unsupervised learning for MRFs on bipartite graphs
2013
Procedings of the British Machine Vision Conference 2013
Persistent Contrastive Divergence) there is an alternative learning approach -a modified EM algorithm which is tractable because of the bipartiteness of the model graph. ...
We show that besides the widely used stochastic gradient approximation (a.k.a. ...
A third option is a stochastic gradient method which is often used in the context of RBMs and is designated as Persistent Contrastive Divergence [13, 14] . ...
doi:10.5244/c.27.72
dblp:conf/bmvc/FlachS13
fatcat:iaheg3sclzbbfkrh46nahldneu
Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift
2005
Evolutionary Computation
This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and ...
This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering ...
The adaptation considered minimizes the Kullback-Leibler divergence between Q(x) and P (x) using a stochastic gradient descent. ...
doi:10.1162/1063656053583478
pmid:15901425
fatcat:qfrjl2sw7rfqlgddfzk24jq7tq
Weighted Contrastive Divergence
[article]
2018
arXiv
pre-print
This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). ...
However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational ...
In this paper we propose an alternative approximation to the CD gradient called Weighted Contrastive Divergence (WCD). ...
arXiv:1801.02567v2
fatcat:rm45ogags5elvgcr27jp62fspa
« Previous
Showing results 1 — 15 out of 7,343 results