An Overview of Probabilistic Tree Transducers for Natural Language Processing [chapter]

Kevin Knight, Jonathan Graehl
2005 Lecture Notes in Computer Science  
Probabilistic finite-state string transducers (FSTs) are extremely popular in natural language processing, due to powerful generic methods for applying, composing, and learning them. Unfortunately, FSTs are not a good fit for much of the current work on probabilistic modeling for machine translation, summarization, paraphrasing, and language modeling. These methods operate directly on trees, rather than strings. We show that tree acceptors and tree transducers subsume most of this work, and we
more » ... iscuss algorithms for realizing the same benefits found in probabilistic string transduction. 1 In the noisy-channel framework, we look for the output string that maximizes P(output ¡ input), which is equivalent (by Bayes Rule) to maximizing P(output) ¢ P(input ¡ output). The first term of the product is often captured by a probabilistic FSA, the second term by a probabilistic FST (or a cascade of them). Indeed, Knight & Al-Onaizan [8] describe how to use generic finite-state tools to implement the statistical machine translation models of [9] . Their scheme is shown in Figure 2 , and [8] gives constructions for the transducers. Likewise, Kumar & Byrne [10] do this job for the phrase-based translation model of [11]. FSA Mary did not slap the green witch ¡ FST Mary not slap slap slap the green witch ¡ FST Mary not slap slap slap NULL the green witch ¡ FST Mary no dió una bofetada a la verde bruja ¡ FST Mary no dió una bofetada a la bruja verde Fig. 2. The statistical machine translation model of [9] implemented with a cascade of standard finite-state transducers [8]. In this model, an observed Spanish sentence is "decoded" back into English through several layers of word substitution, insertion, deletion, and permutation. Trees Despite the attractive computational properties of finite-state tools, no one is particularly fond of their representational power for natural language. They are not adequate for long-distance reordering of symbols needed for machine translation, and they cannot implement the trees, graphs, and variable bindings that have proven useful for describing natural language processes. So over the past several years, researchers have been developing probabilistic tree-based models for -machine translation (e.g., [13, 14, 12,15, 16, 17] ) -summarization (e.g., [18]) -paraphrasing (e.g., [19]) ¡
doi:10.1007/978-3-540-30586-6_1 fatcat:opmc54t5uzbnllkwhyh5hqeyta