Source Code for the 'Corpus of Decisions: Permanent Court of International Justice' (CD-PCIJ-Source) [article]

Sean Fobbe
2022 Zenodo  
Overview This code in the R Programming Language downloads and processes the full set of decisions and appended opinions rendered by the Permanent Court of International Justice (PCIJ) — as published in Series A, B and AB on the website of the International Court of Justice (ICJ) — into a rich and structured human- and machine-readable data set. It is the basis for the Corpus of Decisions: Permanent Court of International Justice (CD-PCIJ). All data sets created with this script will always be
more » ... osted permanently open access and freely available at Zenodo, the scientific repository of CERN. Each version is uniquely identified with a persistent Digitial Object Identifier (DOI), the Version DOI. The newest version of the data set will always available via the link of the Concept DOI: https://doi.org/10.5281/zenodo.3840479 An academic paper describing the construction and relevance of the data set has been accepted by the Journal of Empirical Legal Studies (JELS) and will be forthcoming this year. Functionality This script will produce 17 ZIP archives: 2 archives of CSV files containing the full machine-readable data set (English/French) 2 archives of CSV files containing the full machine-readable metadata (English/French) 2 archives of TXT files containing all machine-readable texts with a reduced set of metadata encoded in the filenames (English/French) 2 archives of PDF files containing all human-readable texts with enhanced OCR (English/French) 2 archives of PDF files containing all human-readable majority opinions with enhanced OCR (English/French) 2 archives of PDF files containing original documents split into monolingual documents (English/French) 2 archives of TXT files containing extracted text from the original documents (English/French) 1 archive of PDF files as originally published by the PCIJ/ICJ (multilingual) 1 archive of analysis data and diagrams 1 archive containing all source files The integrity and veracity of each ZIP archive is docume [...]
doi:10.5281/zenodo.4136956 fatcat:kdshphslnraonlmu4rxkay6hpm