USID and Pycroscopy – Open Source Frameworks for Storing and Analyzing Imaging and Spectroscopy Data

Suhas Somnath, Chris R. Smith, Nouamane Laanait, Rama K. Vasudevan, Stephen Jesse
2019 Microscopy and Microanalysis  
Over the past two decades, continued improvements in instrumentation hardware [1] as well as the increased accessibility to high-performance computing (HPC) resources [2], and more sophisticated computer algorithms [3] have enabled profound breakthroughs in microscopy and microanalysis. These advancements have led to unprecedented proliferation in microscopy datasets both in dimensionality and size. However, in many cases the software to analyze and process the data has not kept pace with the
more » ... ta explosion or advancements in instrumentation, computing, and analysis techniques. This challenge is compounded by the lack of consensus within the scientific community with each commercial instrument writing measured data into proprietary file formats that impede access to data and metadata, sharing, correlation, and long-term archival of data. Therefore, ushering the promise of data-intensive microscopy and microanalysis research requires general and robust data storage, and analysis platforms that are HPC-ready and open source. Towards, this end, we have developed the Universal Spectroscopic and Imaging Data (USID) model that can represent data of any dimensionality, shape, size, precision, and instrument of origin in a standardized manner. USID data stored in hierarchical data format (HDF5) files facilitate the storage of very large data, access via any programming language, and compatibility with HPC and cloud computing architectures. More crucially, USID in HDF5 is curation-ready and therefore both meets the guidelines for data sharing and satisfies the implementation of digital data management issued to federally funded agencies. Correspondingly, we have developed a pair of free, and open source python software packages -pyUSID (https://pycroscopy.github.io/pyUSID/about.html) and pycroscopy (https://pycroscopy.github.io/pycroscopy/about.html) that facilitate access to and scientific analysis on USID datasets respectively. pyUSID provides tools that simplify reading, writing, reshaping, slicing, reducing, and interactively visualizing USID datasets. In addition, pyUSID also provides a framework that helps scientists easily translate scientific problems into computational problems while handling memory management, and seamlessly scales computations over multiple cores in a computer or multiple computers in a compute cluster or HPC. The engineering-focused pyUSID forms the foundation for the pure-science package -pycroscopy that focuses on the scientific analysis of nanoscale imaging and spectroscopic modalities. Although there are many open-source software packages, most are instrument-or mode-specific, limited to 2D images or specific kinds of 3D data, are not fundamentally designed to handle datasets of large size or dimensionality, do not support scalable computation from laptops to HPCs, are challenging to install, or do not have comprehensive documentation. Pycroscopy offers scientists an array of Translators that extract metadata and data from many proprietary file formats and write all information into instrument-and vendor-agnostic USID HDF5 files. The general nature of USID allows data processing and analysis algorithms in pycroscopy to be generalized in-turn, thereby allowing a single version of the algorithm to be applied to data collected from 220
doi:10.1017/s1431927619001831 fatcat:qhj6a2fenjbbrflmityo6gr5vy