Restricted Boltzmann Machines: Introduction and Review [chapter]

Guido Montúfar
2018 Springer Proceedings in Mathematics & statistics  
The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is
more » ... tractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation. 1 2 G. Montúfar subject, and lets us advertise some of the interesting and challenging problems that still remain to be addressed. Brief overview A Boltzmann machine is a model of pairwise interacting units that update their states over time in a probabilistic way depending on the states of the adjacent units. Boltzmann machines have been motivated as models for parallel distributed computing [36, 1, 37] . They can be regarded as stochastic versions of Hopfield networks [38] , which serve as associative memories. They are closely related to mathematical models of interacting particles studied in statistical physics, especially the Ising model [39, Chapter 14]. For each fixed choice of interaction strengths and biases in the network, the collective of units assumes different states at relative frequencies that depend on their associated energy, in what is known as a Gibbs-Boltzmann probability distribution [30] . As pair interaction models, Boltzmann machines define special types of hierarchical log-linear models, which are special types of exponential family models [14] closely related to undirected graphical models [42, 40] . In contrast to the standard discussion of exponential families, Boltzmann machines usually involve hidden variables. Hierarchical log-linear models are widely used in statistics. Their geometric properties are studied especially in information geometry [5, 8, 6, 11] and algebraic statistics [21, 72] . The information geometry of the Boltzmann machine was first studied by Amari, Kurata, and Nagaoka [7]. A restricted Boltzmann machine (RBM) is a special type of a Boltzmann machine where the pair interactions are restricted to be between an observed set of units and an unobserved set of units. These models were introduced in the context of harmony theory [70] and unsupervised two layer networks [27] . RBMs played a key role in the development of greedy layer-wise learning algorithms for deep layered architectures [35, 12] . A recommended introduction to RBMs is [24] . RBMs have been studied intensively, with tools from optimization, algebraic geometry, combinatorics, coding theory, polyhedral geometry, and information geometry among others. Some of the advances over the past few years include results in relation to their approximation properties [77, 43, 58, 57] , dimension [17, 53, 55], semialgebraic description [18, 68], efficiency of representation [45, 54], sequential optimization [23, 26], statistical complexity [10], sampling and training [64, 22, 23, 26], information geometry [7, 6, 41]. Organization This article is organized as follows. In Section 2 we introduce Boltzmann machines, Gibbs sampling, and the associated probability models. In Section 3 we introduce restricted Boltzmann machines and discuss various perspectives, viewing the probability models as marginals of exponential families with Kronecker factoring suffi-
doi:10.1007/978-3-319-97798-0_4 fatcat:6nhgwspkybco5jgw7azfp4u6ly