A new data science research program: evaluation, metrology, standards, and community outreach

Bonnie J. Dorr, Craig S. Greenberg, Peter Fontana, Mark Przybocki, Marion Le Bras, Cathryn Ploehn, Oleg Aulov, Martial Michel, E. Jim Golden, Wo Chang
2016 International Journal of Data Science and Analytics  
This article examines foundational issues in data science including current challenges, basic research questions, and expected advances, as the basis for a new data science research program (DSRP) and associated data science evaluation (DSE) series, introduced by the National Institute of Standards and Technology (NIST) in the fall of 2015. The DSRP is designed to facilitate and accelerate research progress in the field of data science and consists of four components: evaluation and metrology,
more » ... tandards, compute infrastructure, and community outreach. A key part of the evaluation and measurement component is the DSE. The DSE series aims to address logistical and evaluation design challenges while providing rigorous measurement methods and an emphasis on generalizability rather than domain-and application-specific approaches. Toward that end, each year the DSE will consist of multiple research tracks and will This article extends a paper that was presented at IEEE Data Science and Advanced Analytics conference in the fall of 2015[1] and also expands upon content from a poster paper at IEEE BigData 2015 [2]. encourage the application of tasks that span these tracks. The evaluations are intended to facilitate research efforts and collaboration, leverage shared infrastructure, and effectively address crosscutting challenges faced by diverse data science communities. Multiple research tracks will be championed by members of the data science community with the goal of enabling rigorous comparison of approaches through common tasks, datasets, metrics, and shared research challenges. The tracks will permit us to measure several different data science technologies in a wide range of fields and will address computing infrastructure, standards for an interoperability framework, and domain-specific examples. This article also summarizes lessons learned from the data science evaluation series pre-pilot that was held in fall of 2015.
doi:10.1007/s41060-016-0016-z dblp:journals/ijdsa/DorrGFPBPAMGC16 fatcat:ngrq7wupyfbs5ozna4sgthcjma