Sorting by Sound : Arbitrary Lexical Ordering for Transcribed Thai Text

Doug Cooper
1995 Pacific Asia Conference on Language, Information and Computation  
When either Thai or transcribed (Romanized) Thai is sorted alphabetically, words that sound very much alike usually end up far apart. maay and may are thrown to opposite ends of the letter m entries, even though mistaking one for the other causes problems for both foreign students who cannot speak clearly, and Thais who can't spell. This paper explains how and why the difficulty occurs, and shows why both Thai and transcription are inherently difficult to sort by sound. It introduces a method
more » ... preprocessing -deriving phonemic signatures -that lets us define improved lexical or dictionary orders, yet does not require anything but standard sorting code. The method can be applied to other languages -Lao, Khmer, and Burmese -that, like Thai, distinguish words on the basis of vowel length and/or tone.
dblp:conf/paclic/Cooper95 fatcat:a2h7zgoqnbevtj57mu4xqki64q