Thai National Corpus

Wirote Aroonmanakun, Kachen Tansiri, Pairit Nittayanuparp
2009 Proceedings of the 7th Workshop on Asian Language Resources - ALR7   unpublished
This paper presents problems and solutions in developing Thai National Corpus (TNC). TNC is designed to be a comparable corpus of British National Corpus. The project aims to collect eighty million words. Since 2006, the project can now collect only fourteen million words. The data is accessible from the TNC Web. Delay in creating the TNC is mainly caused from obtaining authorization of copyright texts. Methods used for collecting data and the results are discussed. Errors during the process of
more » ... ring the process of encoding data and how to handle these errors will be described.
doi:10.3115/1690299.1690321 fatcat:cxedp4xie5enrcxgq3umhaobiy