Issues in digital text representation, on-line dissemination, sharing and re-use for African minority languages

Emmanuel Ngué Um
2017 Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages   unpublished
In tone languages of Africa, tones may encode meaning either as separate linguistic units or in association with segmental morphemes. In mainstream text representation models, however, the linguistic autonomy of tones is often overridden by the graphical layout of characters. In these models, accents which mark tones cannot be easily parsed for their linguistic information apart from the segments which bear them. This paper suggests a model or repre-sentation based on TEI-XML where both tones
more » ... where both tones and segments can be represented as a unique string of characters, therefore making text information easily parsable.
doi:10.18653/v1/w17-0104 fatcat:xz6ktjyz4zb5jnk76u43ljgwgq