Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW)

Jeffery S. Horsburgh, Brian Crookston, Alfonso Torres-Rua, Tianfang Xu, Anthony Castronova
2022 Zenodo  
Scientific and related management challenges in the water domain are inherently multi-disciplinary, requiring synthesis of data of multiple types from multiple domains. Many data manipulation, visualization, and analysis tasks performed by water scientists are difficult because (1) datasets are becoming larger and more complex; (2) standard data formats for common data types are not always agreed upon, and, when they are, they are not always mapped to an efficient structure for visualization
more » ... /or analysis within an analytical environment; and (3) water scientists generally lack training in data intensive scientific methods that would enable them to use new and existing tools to efficiently tackle large and complex datasets. This project advances Data Science and Analytics for Water (DSAW) through developing: (1) an advanced object data model that maps common water-related data types to performant data structures within the object-oriented Python language and analytical environment based upon standard file, data, and content types established by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) HydroShare system; (2) two new Python packages that enable users to write Python code for automating retrieval of required water data, loading it into performant memory objects specified by the object data model we design, and performing analysis in a reproducible way that can be shared, collaborated around, and formally published for reuse. We have developed domain-specific data science applications to demonstrate how the new Python Packages can be paired with the powerful data science capabilities of existing Python packages like Pandas, numpy, and scikit-learn to develop advanced analytical workflows within a Python environment. By doing so, we are extending the data access, collaboration, and archival capabilities of the HydroShare data and model repository and promoting its use as a platform for reproducible water-data science.
doi:10.5281/zenodo.6851401 fatcat:skf7gvn7wvgdfldseig4g52mx4