Filters








62 Hits in 1.4 sec

Justifying and Generalizing Contrastive Divergence

Yoshua Bengio, Olivier Delalleau
2009 Neural Computation  
We study an expansion of the log-likelihood in undirected graphical models such as the Restricted Boltzmann Machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector, in RBMs). We are particularly interested in estimators of the gradient of the log-likelihood obtained through this expansion. We show that its residual term converges to zero, justifying the use of a truncation,
more » ... . running only a short Gibbs chain, which is the main idea behind the Contrastive Divergence (CD) estimator of the log-likelihood gradient. By truncating even more, we obtain a stochastic reconstruction error, related through a mean-field approximation to the reconstruction error often used to train autoassociators and stacked auto-associators. The derivation is not specific to the particular parametric forms used in RBMs, and only requires convergence of the Gibbs chain. We present theoretical and empirical evidence linking the number of Gibbs steps k and the magnitude of the RBM parameters to the bias in the CD estimator. These experiments also suggest that the sign of the CD estimator is
doi:10.1162/neco.2008.11-07-647 pmid:19018704 fatcat:cmm4n7r65fhmtbmohf2yxnqdfi

Label Propagation and Quadratic Criterion [chapter]

Bengio Yoshua, Delalleau Olivier, Roux Nicolas Le
2006 Semi-Supervised Learning  
This observation leads to a more general cost criterion involving a trade-off between (11.9) and (11.10) (Belkin et al. [2004] , Delalleau et al. [2005] ).  ...  regularization, which naturally leads to a regularization term based on the graph Laplacian (Belkin and Niyogi [2003] , Joachims [2003] , Zhou et al. [2004] , Zhu et al. [2003] , Belkin et al. [2004] , Delalleau  ... 
doi:10.7551/mitpress/9780262033589.003.0011 fatcat:hsu674c5d5fhnmmk2xfsx5dv5a

Shallow vs. Deep Sum-Product Networks

Olivier Delalleau, Yoshua Bengio
2011 Neural Information Processing Systems  
We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures. We prove there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units. Such results
more » ... were not available until now, and contribute to motivate recent research involving learning of deep sum-product networks, and more generally motivate research in Deep Learning.
dblp:conf/nips/DelalleauB11 fatcat:tqgz3xj54ngnhmbw4v5z6tigfq

DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS

Yoshua Bengio, Olivier Delalleau, Clarence Simard
2010 Computational intelligence  
This article is inspired by previous work that has shown such limitations in the case of kernel methods with a local kernel (Bengio, Delalleau, and Le Roux 2006a) as well as in the case of so-called  ... 
doi:10.1111/j.1467-8640.2010.00366.x fatcat:tqyvj6kr2bhm7ggra6hg7e3om4

Convex Neural Networks

Yoshua Bengio, Nicolas Le Roux, Pascal Vincent, Olivier Delalleau, Patrice Marcotte
2005 Neural Information Processing Systems  
Experimental Results We performed experiments on the 2-D double moon toy dataset (as used in (Delalleau, Bengio and Le Roux, 2005)) , to be able to compare with the exact version of the algorithm.  ... 
dblp:conf/nips/BengioRVDM05 fatcat:7szjzlln3vdgjcddgjf2q44j6y

On the Expressive Power of Deep Architectures [chapter]

Yoshua Bengio, Olivier Delalleau
2011 Lecture Notes in Computer Science  
On the Expressive Power of Deep Architectures Yoshua Bengio and Oliver Delalleau S Function approximation f (x) = ....  ... 
doi:10.1007/978-3-642-24412-4_3 fatcat:dvr4psguy5gl5nze2gn2v3soni

Discrete and Continuous Action Representation for Practical RL in Video Games [article]

Olivier Delalleau, Maxim Peter, Eloi Alonso, Adrien Logut
2019 arXiv   pre-print
While most current research in Reinforcement Learning (RL) focuses on improving the performance of the algorithms in controlled environments, the use of RL under constraints like those met in the video game industry is rarely studied. Operating under such constraints, we propose Hybrid SAC, an extension of the Soft Actor-Critic algorithm able to handle discrete, continuous and parameterized actions in a principled way. We show that Hybrid SAC can successfully solve a highspeed driving task in
more » ... e of our games, and is competitive with the state-of-the-art on parameterized actions benchmark tasks. We also explore the impact of using normalizing flows to enrich the expressiveness of the policy at minimal computational cost, and identify a potential undesired effect of SAC when used with normalizing flows, that may be addressed by optimizing a different objective.
arXiv:1912.11077v1 fatcat:znyoi6kog5fhfmoyso5f2yd6ka

Spectral Dimensionality Reduction [chapter]

Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-François Paiement, Pascal Vincent, Marie Ouimet
2006 Studies in Fuzziness and Soft Computing  
doi:10.1007/978-3-540-35488-8_28 fatcat:evyhhvxqdjgjvhp5k33isoqqwa

The Curse of Highly Variable Functions for Local Kernel Machines

Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux
2005 Neural Information Processing Systems  
., 2004; Belkin, Matveeva and Niyogi, 2004; Delalleau, Bengio and Le Roux, 2005) , which also fall in this category, and share many ideas with manifold learning algorithms.  ...  The graph-based algorithms we consider here can be seen as minimizing the following cost function, as shown in (Delalleau, Bengio and Le Roux, 2005) : C( Ŷ ) = Ŷl − Y l 2 + µ Ŷ L Ŷ + µ Ŷ 2 (9) with Ŷ  ... 
dblp:conf/nips/BengioDR05 fatcat:vpk5xvbjr5dd5pzriwfjn6cm44

Efficient EM Training of Gaussian Mixtures with Missing Data [article]

Olivier Delalleau and Aaron Courville and Yoshua Bengio
2018 arXiv   pre-print
One should keep in mind that the EM algorithm assumes the missing Olivier Delalleau, Aaron Courville and Yoshua Bengio are with the Department of Computer Science and Operations Research, University of  ... 
arXiv:1209.0521v2 fatcat:4jn3cfkukfdw5pky3u3xu5tjym

Efficient Non-Parametric Function Induction in Semi-Supervised Learning

Olivier Delalleau, Yoshua Bengio, Nicolas Le Roux
2005 International Conference on Artificial Intelligence and Statistics  
There has been an increase of interest for semi-supervised learning recently, because of the many datasets with large amounts of unlabeled examples and only a few labeled ones. This paper follows up on proposed nonparametric algorithms which provide an estimated continuous label for the given unlabeled examples. First, it extends them to function induction algorithms that minimize a regularization criterion applied to an outof-sample example, and happen to have the form of Parzen windows
more » ... ors. This allows to predict test labels without solving again a linear system of dimension n (the number of unlabeled and labeled training examples), which can cost O(n 3 ). Second, this function induction procedure gives rise to an efficient approximation of the training process, reducing the linear system to be solved to m n unknowns, using only a subset of m examples. An improvement of O(n 2 /m 2 ) in time can thus be obtained. Comparative experiments are presented, showing the good performance of the induction formula and approximation algorithm.
dblp:conf/aistats/DelalleauBR05 fatcat:yej3f3zde5aozdniyi4jh4uuxy

A Closer Look at Codistillation for Distributed Training [article]

Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat
2021 arXiv   pre-print
Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain identical parameters. We investigate codistillation in a distributed training setup, complementing
more » ... evious work which focused on extremely large batch sizes. Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization mechanism. These findings hold across a range of batch sizes and learning rate schedules, as well as different kinds of models and datasets. Obtaining this level of accuracy, however, requires properly accounting for the regularization effect of codistillation, which we highlight through several empirical observations. Overall, this work contributes to a better understanding of codistillation and how to best take advantage of it in a distributed computing environment.
arXiv:2010.02838v2 fatcat:s7ugrgic2rdfdik5rm2neuwwxe

Learning Eigenfunctions Links Spectral Embedding and Kernel PCA

Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-François Paiement, Pascal Vincent, Marie Ouimet
2004 Neural Computation  
In this paper, we show a direct relation between spectral embedding methods and kernel PCA, and how both are special cases of a more general learning problem, that of learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. 1 Whereas spectral embedding methods only provided coordinates for the training points, the analysis justifies a simple extension to out-of-sample examples (the Nyström formula) for Multi-Dimensional Scaling,
more » ... clustering, Laplacian eigenmaps, Locally Linear Embedding (LLE) and Isomap. The analysis provides, for all such spectral embedding methods, the definition of a loss function, whose empirical average is minimized by the traditional algorithms. The asymptotic expected value of that loss defines a generalization performance and clarifies what these algorithms are trying to learn. Experiments with LLE, Isomap, spectral clustering and MDS show that this out-of-sample embedding formula generalizes well, with a level of error comparable to the effect of small perturbations of the training set on the embedding.
doi:10.1162/0899766041732396 pmid:15333211 fatcat:uzc4rzlorzfczdclqyeq7hohxy

DETONATION CLASSIFICATION FROM ACOUSTIC SIGNATURE WITH THE RESTRICTED BOLTZMANN MACHINE

Yoshua Bengio, Nicolas Chapados, Olivier Delalleau, Hugo Larochelle, Xavier Saint-Mleux, Christian Hudon, Jérôme Louradour
2012 Computational intelligence  
We compare the recently proposed Discriminative Restricted Boltzmann Machine to the classical Support Vector Machine on a challenging classification task consisting in identifying weapon classes from audio signals. The three weapon classes considered in this work (mortar, rocket and rocket-propelled grenade), are difficult to reliably classify with standard techniques since they tend to have similar acoustic signatures. In addition, specificities of the data available in this study makes it
more » ... lenging to rigorously compare classifiers, and we address methodological issues arising from this situation. Experiments show good classification accuracy that could make these techniques suitable for fielding on autonomous devices. Discriminative Restricted Boltzmann Machines appear to yield better accuracy than Support Vector Machines, and are less sensitive to the choice of signal preprocessing and model hyperparameters. This last property is especially appealing in such a task where the lack of data makes model validation difficult.
doi:10.1111/j.1467-8640.2012.00419.x fatcat:65o2c3meo5gsnngzenbh4w3flu

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines

Guillaume Desjardins, Aaron C. Courville, Yoshua Bengio, Pascal Vincent, Olivier Delalleau
2010 Journal of machine learning research  
Despite CD's popularity, it does not yield the best approximation of the log-likelihood gradient (Carreira-Perpiñan & Hinton, 2005; Bengio & Delalleau, 2009) .  ... 
dblp:journals/jmlr/DesjardinsCBVD10 fatcat:i7oamsfwvbcqrfvfryt6hlg2si
« Previous Showing results 1 — 15 out of 62 results