Rigorous approximated determinization of weighted automata

Benjamin Aminof, Orna Kupferman, Robby Lampert
2013 Theoretical Computer Science  
A nondeterministic weighted finite automaton (WFA) maps an input word to a numerical value. Applications of weighted automata include formal verification of quantitative properties, as well as text, speech, and image processing. Many of these applications require the WFAs to be deterministic, or work substantially better when the WFAs are deterministic. Unlike NFAs, which can always be determinized, not all WFAs have an equivalent deterministic weighted automaton (DWFA). In [1], Mohri describes
more » ... a determinization construction for a subclass of WFA. He also describes a property of WFAs (the twins property), such that all WFAs that satisfy the twins property are determinizable and the algorithm terminates on them. Unfortunately, many natural WFAs cannot be determinized. In this paper we study approximated determinization of WFAs. We describe an algorithm that, given a WFA A and an approximation factor t ≥ 1, constructs a DWFA A ′ that t-determinizes A. Formally, for all words w ∈ Σ * , the value of w in A ′ is at least its value in A and at most t times its value in A. Our construction involves two new ideas: attributing states in the subset construction by both upper and lower residues, and collapsing attributed subsets whose residues can be tightened. The larger the approximation factor is, the more attributed subsets we can collapse. Thus, t-determinization is helpful not only for WFAs that cannot be determinized, but also in cases determinization is possible but results in automata that are too big to handle. In addition, t-determinization is useful for reasoning about the competitive ratio of online algorithms. We also describe a property (the t-twins property) and use it in order to characterize t-determinizable WFAs. Finally, we describe a polynomial algorithm for deciding whether a given WFA has the t-twins property.
doi:10.1016/j.tcs.2013.02.005 fatcat:3no3eoyglnah3fon5yi3ccewzq