Construction of weighted finite state transducers for very wide context-dependent acoustic models

M. Schuster, T. Hori
2005 IEEE Workshop on Automatic Speech Recognition and Understanding, 2005.  
A previous paper by the authors described an algorithm for efficient construction of Weighted Finite State Transducers for speech recognition when high-order context-dependent models of order K > 3 (triphones) with tied state observation distributions are used, and showed practical application of the algorithm up to K = 5 (quinphones). In this paper we give additional details of the improved implementation and analyze the algorithm's practical runtime requirements and memory footprint for
more » ... t-orders up to K = 13 (+/-6 phones context) when building fully cross-word capable WFSTs for large vocabulary speech recognition tasks. We show that for typical systems it is possible to use any practical context-order K ≤ 13 without having to fear an exponential explosion of the search space, since the necessary state ID to phone transducer (resembling a phone-loop observing all possible Kphone constraints) can be built in a few minutes at most. The paper also gives some implementation details of how we efficiently collect context statistics and build phonetic decision trees for very wide context-dependent acoustic models. 0-7803-9479-8/05/$20.00  2005 IEEE
doi:10.1109/asru.2005.1566482 fatcat:phjeljpfkndj5djv5igj5adaoq