Cataloging for a billion word library of Greek and Latin

Gregory Crane, Bridget Almas, Alison Babeu, Lisa Cerrato, Anna Krohn, Frederik Baumgart, Monica Berti, Greta Franzini, Simona Stoyanova
2014 Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14  
This paper reports work on a catalog that includes not only standard metadata but also a complete reference transcription for each work so that users can explicitly cite not only every version but also every word in every version of a work. The Functional Requirements for Bibliographic Records conceptual model (FRBR) allows us to move beyond printed books and to track the logical units within (and often across) printed books: works (e.g., the Iliad) and expressions (e.g., versions such as the
more » ... th century Venetus A manuscript or Butler's English translation). The Canonical Text Services (CTS) Data Model builds upon FRBR, allowing us to cite each word in any version of a text and to do so by building upon established citation schemes inherited from print (e.g., the chapter/verse citation scheme in the Bible). This paper describes a concrete implementation of such a catalogue of 3,679 Greek and Latin works that includes FRBR inspired metadata and TEI XML transcriptions that were revised to facilitate implementing a CTS API. It also describes how all the different versions of a work can be serialized as variations on the reference version. The FRBR+CTS catalog provides data by which text reuse and alignment services can automatically detect different versions of and quotations from the reference text, aligning all discovered instances according to a canonical citation scheme.
doi:10.1145/2595188.2595190 dblp:conf/datech/CraneABCKBBFS14 fatcat:rwiipdep3fbqpd37qv6c4lj3wi