Documents to Data: the evolution of approaches to a library archive

Rebecca Sutton Koeser, Rebecca Munson, Joshua Kotin, Elspeth Green
2019 Zenodo  
In Digital Humanities we speak of moving from "documents to data." In many projects, this is literal, a process of extracting information or turning text into tokens suitable for computational analysis. For the Shakespeare and Company Project, it entailed a conceptual shift from thinking of archival materials as texts to be encoded and described, to thinking of them as data to be managed in a relational database. This project is based on the Sylvia Beach papers, held at Princeton University,
more » ... eton University, which document the privately owned lending library in Paris frequented by notable writers of the Lost Generation. Materials include logbooks with membership information and lending cards for a subset of members with addresses and borrowing histories. This poster will present the history of a multi-year project in three phases, each with benefits, difficulties, and stakes. The evolution of the project demonstrates the development of our thinking as a team as we moved toward a public-facing site designed for a broad audience. In the first phase, we encoded content from the library using TEI/XML, an approach commonly employed for documentary editing. The choice of TEI/XML fit the initial aims of the project, but even rich transcription did not offer the opportunity to fully connect the people, places, and books referenced. Consequently, the second phase was dedicated to designing a custom relational database to model the world of the library by explicitly surfacing different types of connections. The third phase required migrating data from the TEI/XML to the relational database, a lengthy process that exposed inconsistencies in the encoding, but also gave us an opportunity to eliminate redundant and unsynchronized information. The conversion process highlighted the benefits and the difficulties of both systems in pursuing similar research questions. A TEI corpus and a relational database both support querying and making connections, but a database is designed for explicit connections, which makes it easier to identify and group [...]
doi:10.5281/zenodo.3277320 fatcat:2zdejfrmvvg3lhmyusox2clzx4