A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is
IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005.
In this paper, we propose a novel approach for automatically identifying the language of a given text based on the concept of string kernels. Our approach can identify the language from the text directly, regardless of its coding system. In particular, we view the text in a more fine-grained encoding as the string of bytes. The similarity between two strings can be implicitly computed through an efficient dynamic alignment using suffix trees. We provide empirical evidence that applying thedoi:10.1109/iscit.2005.1567018 fatcat:6bycoajo7zbw7aw6kv5fbezklm