XML standard for Indic online handwritten database

Swapnil Belhe, Srinivasa Chakravarthy, A. G. Ramakrishnan
2009 Proceedings of the International Workshop on Multilingual OCR - MOCR '09  
This article proposes an improved XML standard for storing online handwritten data in Indian languages. This standard has evolved over a period of two years, and is currently being used by the Consortium for online handwritten recognition of Indian languages, for annotating about 100,000 handwritten words in each of six Indian languages, namely, Tamil, Kannada, Telugu, Malayalam, Hindi and Bangla. In order that the huge amount of data that is being collected is useable by the future
more » ... future researchers, it is preferable that the data is stored in a format that is unambiguous and easy to read. The uniqueness of this refined standard is that it gives quality labels at different levels to the data, and has provision to annotate all the peculiarities of writing the script of the various Indian languages included in the current consortium project. The current format allows the use of automated and semi-automated annotation tools.
doi:10.1145/1577802.1577823 dblp:conf/icdar/BelheCR09 fatcat:pzllnngdtjavrmxrhitbdi3qli