Structured Belief Propagation for NLP

Matthew R. Gormley, Jason Eisner
2015 Tutorials  
Tutorial Overview Statistical natural language processing relies on probabilistic models of linguistic structure. More complex models can help capture our intuitions about language, by adding linguistically meaningful interactions and latent variables. However, inference and learning in the models we want often poses a serious computational challenge. Belief propagation (BP) and its variants provide an attractive approximate solution, especially using recent training methods. These approaches
more » ... n handle joint models of interacting components, are computationally efficient, and have extended the state-of-the-art on a number of common NLP tasks, including dependency parsing, modeling of morphological paradigms, CCG parsing, phrase extraction, semantic role labeling, and information extraction (Smith and Eisner, 2008; Dreyer and Eisner, 2009; Auli and Lopez, 2011; Burkett and Klein, 2012; Naradowsky et al., 2012; Stoyanov and Eisner, 2012) . This tutorial delves into BP with an emphasis on recent advances that enable state-of-the-art performance in a variety of tasks. Our goal is to elucidate how these approaches can easily be applied to new problems. We also cover the theory underlying them. Our target audience is researchers in human language technologies; we do not assume familarity with BP. In the first three sections, we discuss applications of BP to NLP problems, the basics of modeling with factor graphs and message passing, and the theoretical underpinnings of "what BP is doing" and how it relates to other variational inference techniques. In the second three sections, we cover key extensions to the standard BP algorithm to enable modeling of linguistic structure, efficient inference, and approximation-aware training. We survey a variety of software tools and introduce a new software framework that incorporates many of the modern approaches covered in this tutorial. 2 Outline 1. Applications [15 min., Eisner] • Intro: Modeling with factor graphs • Morphological paradigms • Dependency and constituency parsing • Alignment; Phrase extraction • Relation extraction; Semantic role labeling • Targeted sentiment • Joint models for NLP 2. Belief Propagation Basics [40 min., Eisner] • Messages and beliefs • Sum-product, max-product, and deterministic annealing • Relation to forward-backward and insideoutside • Acyclic vs. loopy graphs • Synchronous vs. asynchronous propagation 3. Theory [25 min., Gormley] • From arc consistency to BP • From Gibbs sampling to particle BP to BP • Other message-passing algorithms • Bethe free energy • Connection to PFCGs and FSMs 4. Incorporating Structure into Factors and Variables [30 min., Gormley] • Embedding dynamic programs (e.g. inside-outside) within factors • String-valued and tree-valued variables 5. Message approximation and scheduling [20 min., Eisner] • Pruning messages • Variational approximations • Residual BP and new variants 6. Approximation-aware Training [30 min., Gormley] • Empirical risk minimization under approximations (ERMA) • BP as a computational expression graph • Automatic differentiation (AD) 7. Software [10 min., Gormley] 9
doi:10.3115/v1/p15-5002 dblp:conf/acl/GormleyE15 fatcat:sfwbq3v7cvdmxctkinv5b4tywe