Filters








26,081 Hits in 8.2 sec

Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines [article]

Guillaume Desjardins, Razvan Pascanu, Aaron Courville, Yoshua Bengio
2013 arXiv   pre-print
This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines.  ...  the natural gradient metric L.  ...  our Metric-Free Natural Gradient (MFNG) algorithm is suitable for joint-training of complex Boltzmann Machines.  ... 
arXiv:1301.3545v2 fatcat:3g5h55fg7bbjldmckgzff56jwu

Natural gradient, fitness modelling and model selection: A unifying perspective

Luigi Malago, Matteo Matteucci, Giovanni Pistone
2013 2013 IEEE Congress on Evolutionary Computation  
This equivalence allows to simultaneously perform model selection and robust estimation of the natural gradient.  ...  In this paper Stochastic Relaxation is used to provide theoretical results on Estimation of Distribution Algorithms (EDAs).  ...  This leads to two different algorithms, Stochastic Gradient Descent (SGD), based on the regular gradient, and Stochastic Natural Gradient Descent (SNGD), based on the natural gradient, reported in Algorithm  ... 
doi:10.1109/cec.2013.6557608 dblp:conf/cec/MalagoMP13 fatcat:e3smb2pcojhmvi7ootvlszxg2y

Training Restricted Boltzmann Machines

Asja Fischer
2015 Künstliche Intelligenz  
Therefore, training algorithms such as Contrastive Divergence (CD) and learning based on Parallel Tempering (PT) rely on Markov chain Monte Carlo methods to approximate the gradient.  ...  It is based on likelihood maximization, but the likelihood and its gradient are computationally intractable.  ...  An Analysis of Centered RBMs An undesired property of training RBMs based on the loglikelihood gradient is that the learning procedure is not invariant to the data representation.  ... 
doi:10.1007/s13218-015-0371-2 fatcat:pi56jeamzbartjca2aj6zybwlu

An Overview of Restricted Boltzmann Machines

Vidyadhar Upadhya, P. S. Sastry
2019 Journal of the Indian Institute of Science  
The aim of this article is to give a tutorial introduction to the restricted Boltzmann machines and to review the evolution of this model.  ...  A Boltzmann machine (BM) is a model of pairwise interacting units where each unit updates its state over time in a probabilistic way depending on the states of the neighboring units.  ...  The H-F method is used to design natural gradient descent for learning Boltzmann machines 11 .  ... 
doi:10.1007/s41745-019-0102-z fatcat:fmg33s4tzvdonjqopkt7td2gny

Unsupervised Learning of Distributions of Binary Vectors Using 2-Layer Networks

Yoav Freund, David Haussler
1991 Neural Information Processing Systems  
The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters (i.e. weights and thresholds) of the modeL Here we give a closed form  ...  We give experimental results for these learning methods on synthetic data and natural data from the domain of handwritten digits.  ...  The process of learning an unknown distribution from examples is usually called denszty estimation or parameter estimation in statistics, depending on the nature of the class of distributions used as models  ... 
dblp:conf/nips/FreundH91 fatcat:ozaqjdzylfaidbow4hvbgrfzxu

Restricted Boltzmann Machines for galaxy morphology classification with a quantum annealer [article]

João Caldeira, Joshua Job, Steven H. Adachi, Brian Nord, Gabriel N. Perdue
2020 arXiv   pre-print
During this exploration, we analyzed the steps required for Boltzmann sampling with the D-Wave 2000Q, including a study of temperature estimation, and examined the impact of qubit noise by comparing and  ...  The methods we compare include Quantum Annealing (QA), Markov Chain Monte Carlo (MCMC) Gibbs Sampling, and Simulated Annealing (SA) as well as machine learning algorithms like gradient boosted decision  ...  However, outside of these rather special training scenarios, RBMs (regardless of the classical or quantum nature of the training algorithm) did not outperform the gradient boosted tree algorithm.  ... 
arXiv:1911.06259v2 fatcat:jehwnvv4xbgullsvys73gqo6jq

A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines [chapter]

KyungHyun Cho, Tapani Raiko, Alexander Ilin, Juha Karhunen
2013 Lecture Notes in Computer Science  
In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational  ...  It has been shown empirically that it is difficult to train a DBM with approximate maximumlikelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine  ...  The proposed scheme is based on an observation that training DBMs consists of two separate stages; approximating a posterior distribution over hidden units and updating parameters to maximize the lower-bound  ... 
doi:10.1007/978-3-642-40728-4_14 fatcat:63wyviu3z5fdxce25zzb5givxe

Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines [article]

Dmytro Korenkevych, Yanbo Xue, Zhengbing Bian, Fabian Chudak, William G. Macready, Jason Rolfe, Evgeny Andriyash
2016 arXiv   pre-print
Here we report on the ability of recent QA hardware to accelerate training of fully visible Boltzmann machines.  ...  We characterize the sampling distribution of QA hardware, and show that in many cases, the quantum distributions differ significantly from classical Boltzmann distributions.  ...  Training and test sets had size 5×10 5 points each, generated by an exact Boltzmann sampler. On these problems,P k (s s s|θ θ θ learn ) provides better fits than B(s s s|θ θ θ learn ).  ... 
arXiv:1611.04528v1 fatcat:6rt5auq4b5capafe5qjc5hsvai

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning [article]

Iris A. M. Huijben, Wouter Kool, Max B. Paulus, Ruud J. G. van Sloun
2022 arXiv   pre-print
The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection.  ...  Over the past years, the machine learning community has proposed several extensions of this trick to facilitate, e.g., drawing multiple samples, sampling from structured domains, or gradient estimation  ...  Huijben gratefully acknowledges support from Onera Health and the project 'OP-SLEEP'. The project 'OP-SLEEP' is made possible by the European Regional Development Fund, in the context of OPZuid. W.  ... 
arXiv:2110.01515v2 fatcat:ixtrm2va3vfthmwq76wmw6k5bm

Quantum Inspired Training for Boltzmann Machines [article]

Nathan Wiebe, Ashish Kapoor, Christopher Granade, Krysta M Svore
2015 arXiv   pre-print
We present an efficient classical algorithm for training deep Boltzmann machines (DBMs) that uses rejection sampling in concert with variational approximations to estimate the gradients of the training  ...  Finally our algorithm can train full Boltzmann machines and scales more favorably with the number of layers in a DBM than greedy contrastive divergence training.  ...  The algorithm utilizes subroutines Q and Q that provide an estimate of the probability of a given configuration of hidden and visible units.  ... 
arXiv:1507.02642v1 fatcat:wl3lgye55jdyplefpngpewiymu

Efficient Learning of Restricted Boltzmann Machines Using Covariance Estimates [article]

Vidyadhar Upadhya, P.S. Sastry
2019 arXiv   pre-print
One of the terms in the gradient, which involves expectation w.r.t. the model distribution, is intractable and is obtained through an MCMC estimate.  ...  Learning RBMs using standard algorithms such as CD(k) involves gradient descent on the negative log-likelihood.  ...  ., 2013) H-F algorithm is used to design natural gradient descent for learning Boltzmann machines.  ... 
arXiv:1810.10777v2 fatcat:4vgfgjqh7rhcdc23m7ooy2afem

Parallel tempering is efficient for learning restricted Boltzmann machines

KyungHyun Cho, Tapani Raiko, Alexander Ilin
2010 The 2010 International Joint Conference on Neural Networks (IJCNN)  
We propose to use an advanced Monte Carlo method called parallel tempering instead, and show experimentally that it works efficiently. The authors are with the  ...  While contrastive divergence learning has been considered an efficient way to learn an RBM, it has a drawback due to a biased approximation in the learning gradient.  ...  Acknowledgements This work was supported by the honours programme of the department, by the Academy of Finland and by the IST Program of the European Community, under the PASCAL2 Network of Excellence.  ... 
doi:10.1109/ijcnn.2010.5596837 dblp:conf/ijcnn/ChoRI10 fatcat:yuimhaah4jemzncxnlk5rgqawm

How to Pretrain Deep Boltzmann Machines in Two Stages [chapter]

Kyunghyun Cho, Tapani Raiko, Alexander Ilin, Juha Karhunen
2015 Springer Series in Bio-/Neuroinformatics  
In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational  ...  It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine  ...  Acknowledgements This work was supported by "the Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170)".  ... 
doi:10.1007/978-3-319-09903-3_10 fatcat:rnh2f3meevhdroawotiigdnvfm

Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis

Xiang Yin, Zhen-Hua Ling, Ya-Jun Hu, Li-Rong Dai
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Experimental results show that our proposed method can produce more natural speech sounds than the hidden Markov model (HMM)-based, DNN-based, and DMDNbased synthesis methods.  ...  At training stage, the DNN part and the CRBM part of the DCRBM are pre-trained successively and then a unified fine-tuning of all model parameters is conducted.  ...  After random initialization, the model parameters of both systems were estimated by backpropagation with a mini-batch-based stochastic gradient descent (SGD) algorithm. 3 When building the system using  ... 
doi:10.1109/icassp.2016.7472654 dblp:conf/icassp/YinLHD16 fatcat:qxl6ptwssvdkrolg47tiim7kfm

A hybrid strategy for Gilbert' s channel characterization using gradient and annealing techniques

TAN-HSU TAN, WEN-WHEI CHANG
1998 International Journal of Systems Science  
This paper presents a hybrid algorithm for estimating Gilbert's channel model parameters from an experimental error-gap distribution.  ...  A stochastic simulated annealing algorithm is applied to determine automatically a set ofgood starting points, which are then used by the deterministic gradient algorithm for faster convergence to the  ...  Acknowledgments This work was supported by the National Science Council, Taiwan, under grant NSC85-2221-E009-029. (A 6)  ... 
doi:10.1080/00207729808929549 fatcat:iwsvlevsyvdqbow3pxar6wuisu
« Previous Showing results 1 — 15 out of 26,081 results