Trident: scalable compute archives: workflows, visualization, and analysis

Arvind Gopu, Soichi Hayashi, Michael D. Young, Ralf Kotulla, Robert Henschel, Daniel Harbeck, Gianluca Chiozzi, Juan C. Guzman
2016 Software and Cyberinfrastructure for Astronomy IV  
The Astronomy scientific community has embraced Big Data processing challenges, e.g. associated with timedomain astronomy, and come up with a variety of novel and efficient data processing solutions. However, data processing is only a small part of the Big Data challenge. Efficient knowledge discovery and scientific advancement in the Big Data era requires new and equally efficient tools: modern user interfaces for searching, identifying and viewing data online without direct access to the
more » ... tracking of data provenance; searching, plotting and analyzing metadata; interactive visual analysis, especially of (time-dependent) image data; and the ability to execute pipelines on supercomputing & cloud resources with minimal user overhead or expertise even to novice computing users. The Trident project at Indiana University offers a comprehensive web-and cloud-based microservice software suite that enables the straight forward deployment of highly customized Scalable Compute Archive (SCA) systems -including extensive visualization and analysis capabilities -with minimal amount of additional coding. Trident seamlessly scales up or down in terms of data volumes and computational needs, and allows feature sets within a web user interface to be quickly adapted to meet individual project requirements. Domain experts only have to provide code or business logic about handling/visualizing their domain's data products and about executing their pipelines and application workflows. Trident's microservices architecture is made up of light-weight services connected by a REST API and/or a message bus; a web interface elements are built using NodeJS, AngularJS, and HighCharts JavaScript libraries among others while backend services are written in NodeJS, PHP/Zend, and Python. The software suite currently consists of (1) a simple Workflow execution framework to integrate, deploy, and execute pipelines and applications (2) a Progress service to monitor workflows and sub-workflows (3) ImageX, an interactive image visualization service (3) an authentication & authorization service (4) a Data service that handles archival, staging and serving of data products, and (5) a Notification service that serves statistical collation and reporting needs of various projects. Several other additional components are under development. Trident is an umbrella project, that evolved from the One Degree Imager -Portal, Pipeline, and Archive (ODI-PPA) project which we had initially refactored toward (1) a powerful analysis/visualization portal for Globular Cluster System (GCS) survey data collected by IU researchers, 2) a data search and download portal for the IU Electron Microscopy Center's data (EMC-SCA), 3) a prototype archive for the Ludwig Maximilian University's Wide Field Imager. The new Trident software has been used to deploy (1) a metadata quality control and analytics portal (RADY-SCA) for DICOM formatted medical imaging data produced by the IU Radiology Center, 2) Several prototype workflows for different domains, 3) a snapshot tool within IU's Karst Desktop environment, 4) a limited component-set to serve GIS data within the IU GIS web portal. Trident SCA systems leverage supercomputing and storage resources at Indiana University but can be configured to make use of any cloud/grid resource, from local workstations/servers to (inter)national supercomputing facilities such as XSEDE. Proc. of SPIE Vol. 9913 99131H-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/24/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx Proc. of SPIE Vol. 9913 99131H-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 08/24/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx
doi:10.1117/12.2233111 fatcat:gafbjj3agngfxeir5lc7cod25e