Enhancing PID services towards a more fine-grained granularity level as a base for a FAIR data infrastructure

Janete Saldanha Bach, Claus-Peter Klas, Peter Mutschke
2022 Zenodo  
Assigning a PID to a whole dataset, as common practice within research data management, is not enough to unambiguously identify the piece of information used and ensure the data citation properly and, consequently, promote the accreditation of research results. There is an increasing research data availability within data repositories, which leads to data visibility and intensifies re-use and reproducibility approaches. Data per se has various levels of granularity. For the case of Social
more » ... es, for instance, the variable level is the construct that provides evidence for the research results and allows future inferences and analyses. The variable level in the Social Sciences research data is a unit of quantitative data, commonly obtained through survey questionnaires or experiments and represented in a tabular datasets format. In the sense of re-use, researchers are much more interested in the concept of those variables. When re-used, variables are currently cited "in the text" without a unique identifier; usually, only the study or parts of the questions are cited. These non-standard practices lead to consequences such as making it inefficient for the service provider to identify critical variables and for the researcher to re-use variables. It also hinders automated access to variables, making harmonization very expensive since it is a costly and time-consuming task. To solve this problem, we propose assigning a PID to the variable level to cite it unambiguously. A Persistent Identifier (PID) is a persistent, unique, and globally resolvable identifier based on an openly specified PID Scheme (EOSC, 2020, see doi: 10.2777/926037). Persistent Identifiers (PIDs) have been the assignment for data identification whatever the standard is (DOI, Handle, URN, ARK). A given study that re-used data (variables) relies on the variable's analysis to provide results and recommendations, make inferences, and produce outcomes through secondary data analysis practices. The approach means identifying the variables must assure a [...]
doi:10.5281/zenodo.6760992 fatcat:kan5xkdfdzd63pphgylgdi7xe4