Better Data Discoverability in Science Gateways

Science gateways primarily focused on remote job executionmanagement generate domain specific output data mainlyreadable by application specific parsers and post processing utilities. For example, computational chemistry data outputs encode molecule information, convergence of the simulation and energy values. Such domain-specific information is non-trivial to search in a generic fashion. It is thus desirable to add a wide range of application-specific and user-specific post-processing features that may include remote executions of scripts and smaller applications that don’t require scheduling on clusters. It is also desirable to support integrations with searching, indexing, and general purpose data analysis and mining tools provided by the Apache “big data” software stack. As gateways become tenants to general purpose platform services, providing a general purpose infrastructure that enables these application specific post-processing steps is an interesting architectural challenge. Furthermore, it is desirable to share results fromthe post-processing and indexing. In this paper, we discuss how we have incorporated a new automated application output indexing system for the SEAGrid Science Gateway using Apache Airavata that will parse and index generated output for easy querying. We also examine data sharing and automated data publication so that another user can reuse theresults without running an already executed experiment andhence reduce resource utilization.

Publication Date:
Jan 27 2017
Date Submitted:
Jun 28 2019
External Resources:

 Record created 2019-06-28, last modified 2019-08-05

Rate this document:

Rate this document:
(Not yet reviewed)