Cataloging for a billion word library of Greek and Latin
Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage - DATeCH '14
This paper reports work on a catalog that includes not only standard metadata but also a complete reference transcription for each work so that users can explicitly cite not only every version but also every word in every version of a work. The Functional Requirements for Bibliographic Records conceptual model (FRBR) allows us to move beyond printed books and to track the logical units within (and often across) printed books: works (e.g., the Iliad) and expressions (e.g., versions such as the
... th century Venetus A manuscript or Butler's English translation). The Canonical Text Services (CTS) Data Model builds upon FRBR, allowing us to cite each word in any version of a text and to do so by building upon established citation schemes inherited from print (e.g., the chapter/verse citation scheme in the Bible). This paper describes a concrete implementation of such a catalogue of 3,679 Greek and Latin works that includes FRBR inspired metadata and TEI XML transcriptions that were revised to facilitate implementing a CTS API. It also describes how all the different versions of a work can be serialized as variations on the reference version. The FRBR+CTS catalog provides data by which text reuse and alignment services can automatically detect different versions of and quotations from the reference text, aligning all discovered instances according to a canonical citation scheme.