Genomics data curation roles, skills and perception of data quality

Hong Huang, Corinne Jörgensen, Besiki Stvilia
2015 Library & Information Science Research  
Compared to a decade ago, genomics scientists, driven by technical changes and availability of massive genomic data, are performing a wider plurality of curation roles including those of end-users, curators, or dual-role users. Scientists with different curation roles (including that of end user) may focus on different data quality aspects and skills requirements in a community curation environment. This study examines how genomics scientists' perceived priorities for data quality and data
more » ... ty skills differ when assuming different roles played in genomics data curation work. The analysis of survey data collected from 147 genomics scientists found that curators of genomic data valued quality criteria that can be assessed through direct examination of the data more highly, while end-users placed a high value on the quality criteria that can be assessed indirectly such as believability. With regard to data quality skills, curators appeared to care more about understanding user's requirements and specific data management skills than end-users, while end-users valued the skills needed to deal with information overload more highlythose needed to identify useful, relevant information from large amounts of data. The study found that scientists with different curation roles, given common curation tasks with the same skill requirements, prioritized different data quality criteria. The data quality, skill priorities, and tradeoffs identified by this study can inform the development of effective data curation mandates and policies, data quality assurance planning and training, and the design of curation role specific tool dashboards and visualization interfaces for genomics data. The widespread use of information and communication technologies has globalized genomic research, and vastly increased both the number of collaborators on projects and the size of the genomics data involved in a project (Özdemir et al.
doi:10.1016/j.lisr.2014.08.003 fatcat:qnekvrl44zbpnmembzkegdacsu