Seeing the Heiltsuk orthography from font encoding through to Unicode: A case study using convertextract [chapter]

Aidan Pine, Mark Turin
2018 Zenodo  
Across the world's languages and cultures, most writing systems predate the use of computers. In the early years of ICT, standards and protocols for encoding and rendering the majority of the world's writing systems were not in place. The opportunity to deploy less-commonly used orthographies in cross-platform digital contexts has steadily increased since Unicode became the most widely used encoding on the web in late 2007 (Davis, 2008). But what happens to resources that were developed before
more » ... e developed before Unicode standards became widespread? While many tools have been created to address this problem and other issues related to transliteration and character level substitutions, 1 this paper describes the process undertaken for the Indigenous and endangered Heiltsuk (Wakashan) language, and outlines a tool (Convertextract) that was designed to convert not only plain text, but also Microsoft Office (pptx, xlsx, docx) documents with the goals of updating and upgrading pre-existing digital textual resources to Unicode standards, and thus preserving the knowledge they contain for both the present and the future.
doi:10.5281/zenodo.3941269 fatcat:bazng4j3x5geloieqzb6vx4qs4