2023 Volume 2023 Issue 13 Pages 23-33
Many researchers and organizations in NARO are working on projects or research tasks in the bioresearch ?eld. In these projects or tasks, dispersion and individualization of data are problems due to a lack of centralized data management and analysis tool development. To solve these problems, NARO started a new project to optimize resource utilization to maximize research results by consolidating human resources, funds, and advanced analysis with state-of-the-art devices. In this project, NARO tries to construct a pipeline system to analyze omics information by linking high-performance computing resources at ?rst because genomics is one of the essential data in the bioresearch ?eld. The high-performance computing resources consist of the genome analysis computer utilized in an organization in NARO and “SHIHO”, the supercomputer. Moreover, genome data for those resources and analyzed data from the resources are stored with suitable metadata in NARO Linked DB. NARO Linked DB is an integrated database for sharing research data across each organization in NARO. This way,by connecting these resources, the pipeline system can provide the function to analyze and search genome data across each organization in NARO with genome-speci?ed metadata. To realize this pipeline, we develop DRA (DDBJ Sequence Read Archive) based metadata input/output systems on the NARO Linked DB. In this paper, we describe the detail of this metadata input/output system and the current status and issues of the pipeline system development.