A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Creating the Thai National Corpus
2007
MANUSYA
This paper reports on the progress of Thai National Corpus development. The TNC is designed as a general corpus of standard Thai. Only written texts are collected in the first phase. It aims to include at least eighty million words. Various text types produced by various authors are included in the TNC so that it would closely represent written language in general. Texts are word segmented and tagged following the Text Encoding Initiative (TEl) guidelines on text encoding. The TNC was designed
doi:10.1163/26659077-01003001
fatcat:7uit4kogxjcitmgzbbgiotybb4