7,343 Hits in 4.5 sec

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence [article]

Mathias Berglund, Tapani Raiko
2014 arXiv   pre-print
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training the weights of Restricted Boltzmann Machines.  ...  In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than exact sampling, while the mean of subsequent PCD estimates has a higher variance than exact sampling  ...  Contrastive Divergence is claimed to benefit from low variance of the gradient estimates when using stochastic gradients.  ... 
arXiv:1312.6002v3 fatcat:otkjsotvnnbgzjfcdd4seversm

Population-Contrastive-Divergence: Does Consistency help with RBM training? [article]

Oswin Krause, Asja Fischer, Christian Igel
2017 arXiv   pre-print
However, when the RBM distribution has many hidden neurons, the consistent estimate of pop-CD may still have a considerable bias and the variance of the gradient estimate requires a smaller learning rate  ...  However, the variance of the gradient estimate increases. We experimentally show that pop-CD can significantly outperform CD.  ...  ACKNOWLEDGEMENTS Christian and Oswin acknowledge support from the Danish National Advanced Technology Foundation through project "Personalized breast cancer screening".  ... 
arXiv:1510.01624v4 fatcat:3krjlyohznd3jawzu4ykwjexya

Gamma Markov Random Fields for Audio Source Modeling

O. Dikmen, A.T. Cemgil
2010 IEEE Transactions on Audio, Speech, and Language Processing  
In this paper, we optimize the hyperparameters of our GMRF-based audio model using contrastive divergence and compare this method to alternatives such as score matching and pseudolikelihood maximization  ...  We present the performance of the GMRF models in denoising and single-channel source separation problems in completely blind scenarios, where all the hyperparameters are jointly estimated given only audio  ...  Turner for fruitful discussions and suggestions.  ... 
doi:10.1109/tasl.2009.2031778 fatcat:valhskewfvd7xjrn2i67m2vpvu

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models [article]

Zhijian Ou, Yunfu Song
2020 arXiv   pre-print
, and lower variance of gradient estimates.  ...  In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully  ...  Ou is also affiliated with Beijing National Research Center for Information Science and Technology. This work was supported by NSFC Grant 61976122, China MOE-Mobile Grant MCM20170301.  ... 
arXiv:2005.14001v1 fatcat:x24apzi7svau3fqmphkaxlyypq

Using fast weights to improve persistent contrastive divergence

Tijmen Tieleman, Geoffrey Hinton
2009 Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09  
, low variance estimate of the sufficient statistics under the model.  ...  The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap  ...  Acknowledgements We thank Ruslan Salakhutdinov for many useful discussions and suggestions. This research was financially supported by NSERC and Microsoft.  ... 
doi:10.1145/1553374.1553506 dblp:conf/icml/TielemanH09 fatcat:wb6pi4hicvfu5lybx2jvu22hla

A Practical Guide to Training Restricted Boltzmann Machines [chapter]

Geoffrey E. Hinton
2012 Lecture Notes in Computer Science  
Many of my past and present graduate students and postdocs have made valuable contributions to the body of practical knowledge described in this technical report.  ...  I have tried to acknowledge particularly valuable contributions in the report, but I cannot always recall who suggested what.  ...  A more radical departure from CD 1 is called "persistent contrastive divergence" (Tieleman, 2008) .  ... 
doi:10.1007/978-3-642-35289-8_32 fatcat:tfonx6lyajbmfbb6sz6cxqi6ou

A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling [article]

Zhijian Ou
2019 arXiv   pre-print
This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence.  ...  We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms.  ...  In contrast, the REINFORCE φ-gradient estimate only depends on involving ∇ φ logq φ (h|x). Improving φ-gradient estimator beyond simple reparameterization Roeder et al. (2017) .  ... 
arXiv:1808.01630v4 fatcat:t4ktiiaoqnasteqjyt6t7w6v54

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Mohammad Ali Keyvanrad, Mohammad Mehdi Homayounpour
2015 International journal of pattern recognition and artificial intelligence  
This result shows that the proposed method outperforms the method presented in the first paper introducing DBN (1.25% error rate) and general classification methods such as SVM (1.4% error rate) and KNN  ...  We argue that these samples can more accurately compute gradient of log probability of training data. According to the results, an error rate of 0.99% was achieved on MNIST test set.  ...  Contrastive Divergence or Persistent Contrastive Divergence) is in sampling in their negative phase [15] . To compute < 𝑣 𝑖 ℎ 𝑗 > 𝑚𝑜𝑑𝑒𝑙 , Gibbs sampling method may be used.  ... 
doi:10.1142/s0218001415510064 fatcat:tvwzhaitujbgxhyzmcpetbja2a

Learning Non-deterministic Representations with Energy-based Ensembles [article]

Maruan Al-Shedivat, Emre Neftci, Gert Cauwenberghs
2015 arXiv   pre-print
We propose an algorithm similar to contrastive divergence for training restricted Boltzmann stochastic ensembles.  ...  Inspired by the stochasticity of the synaptic connections in the brain, we introduce Energy-based Stochastic Ensembles.  ...  E.N. and G.C. were supported in part by the National Science Foundation (NSF EFRI-1137279) and the Office of Naval Research (ONR MURI 14-13-1-0205). M.A. was supported by KAUST Graduate Fellowship.  ... 
arXiv:1412.7272v2 fatcat:4t4emfk52vfdnmflgdww7h3kq4

Justifying and Generalizing Contrastive Divergence

Yoshua Bengio, Olivier Delalleau
2009 Neural Computation  
We show that its residual term converges to zero, justifying the use of a truncation, i.e. running only a short Gibbs chain, which is the main idea behind the Contrastive Divergence (CD) estimator of the  ...  We are particularly interested in estimators of the gradient of the log-likelihood obtained through this expansion.  ...  If the contrastive divergence update is considered like a biased and noisy estimator of the true loglikelihood gradient, it can be shown that stochastic gradient descent converges (to a local minimum),  ... 
doi:10.1162/neco.2008.11-07-647 pmid:19018704 fatcat:cmm4n7r65fhmtbmohf2yxnqdfi

Variational Noise-Contrastive Estimation [article]

Benjamin Rhodes, Michael Gutmann
2019 arXiv   pre-print
To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models.  ...  However, learning their parameters from data is intractable, and few estimation techniques are currently available for such models.  ...  Benjamin Rhodes was supported in part by the EPSRC Centre for Doctoral Training in Data Science, funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1) and the University  ... 
arXiv:1810.08010v3 fatcat:nhrcte3vfraxppnamnd235pwfq

An Efficient Learning Procedure for Deep Boltzmann Machines

Ruslan Salakhutdinov, Geoffrey Hinton
2012 Neural Computation  
Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains.  ...  The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden  ...  Acknowledgments This research was supported by NSERC and by gifts from Google and Microsoft.  ... 
doi:10.1162/neco_a_00311 pmid:22509963 fatcat:fqz2deyygjglzk4wquyfwvcw5y

Unsupervised learning for MRFs on bipartite graphs

Boris Flach, Tomas Sixta
2013 Procedings of the British Machine Vision Conference 2013  
Persistent Contrastive Divergence) there is an alternative learning approach -a modified EM algorithm which is tractable because of the bipartiteness of the model graph.  ...  We show that besides the widely used stochastic gradient approximation (a.k.a.  ...  A third option is a stochastic gradient method which is often used in the context of RBMs and is designated as Persistent Contrastive Divergence [13, 14] .  ... 
doi:10.5244/c.27.72 dblp:conf/bmvc/FlachS13 fatcat:iaheg3sclzbbfkrh46nahldneu

Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift

Marcus Gallagher, Marcus Frean
2005 Evolutionary Computation  
This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and  ...  This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering  ...  The adaptation considered minimizes the Kullback-Leibler divergence between Q(x) and P (x) using a stochastic gradient descent.  ... 
doi:10.1162/1063656053583478 pmid:15901425 fatcat:qfrjl2sw7rfqlgddfzk24jq7tq

Weighted Contrastive Divergence [article]

Enrique Romero Merino and Ferran Mazzanti Castrillejo and Jordi Delgado Pin and David Buchaca Prats
2018 arXiv   pre-print
This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD).  ...  However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational  ...  In this paper we propose an alternative approximation to the CD gradient called Weighted Contrastive Divergence (WCD).  ... 
arXiv:1801.02567v2 fatcat:rm45ogags5elvgcr27jp62fspa
« Previous Showing results 1 — 15 out of 7,343 results