Efficient determinization of tagged word lattices using categorial and lexicographic semirings

Izhak Shafran, Richard Sproat, Mahsa Yarmohammadi, Brian Roark
2011 2011 IEEE Workshop on Automatic Speech Recognition & Understanding  
Speech and language processing systems routinely face the need to apply finite state operations (e.g., POS tagging) on results from intermediate stages (e.g., ASR output) that are naturally represented in a compact lattice form. Currently, such needs are met by converting the lattices into linear sequences (n-best scoring sequences) before and after applying the finite state operations. In this paper, we eliminate the need for this unnecessary conversion by addressing the problem of picking
more » ... blem of picking only the single-best scoring output labels for every input sequence. For this purpose, we define a categorial semiring that allows determinzation over strings and incorporate it into a Tropical, Categorial lexicographic semiring. Through examples and empirical evaluations we show how determinization in this lexicographic semiring produces the desired output. The proposed solution is general in nature and can be applied to multi-tape weighted transducers that arise in many applications.
doi:10.1109/asru.2011.6163945 dblp:conf/asru/ShafranSYR11 fatcat:gkczddt64nh3jkpk6ag3ttc3ui