オミクス情報解析のための農研機構統合DB カスタムメタデータ基盤と ゲノム解析パイプラインシステムの開発

小林 暁雄; 坂井 寛章; 桂樹 哲雄; 伊藤 研悟; 稲冨 素子; 江口 尚; 川村 隆浩

doi:10.34503/naroj.2023.13_23

Review

Development of the infrastructure of the customizable metadata on the NARO Linked DB and genomics analysis processing system for pipeline systems of omics information analysis

Akio KOBAYASHI, Hiroaki SAKAI, Tetsuo KATSURAGI , Kengo ITO, Motoko INATOMI, Hisashi EGUCHI, Takahiro KAWAMURA

Author information

Keywords: genome analysis, metadata, system consolidation, r, metadata, system consolidation, research data management management

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS FULL-TEXT HTML

2023 Volume 2023 Issue 13 Pages 23-33

DOI https://doi.org/10.34503/naroj.2023.13_23

Details

Abstract

Many researchers and organizations in NARO are working on projects or research tasks in the bioresearch ?eld. In these projects or tasks, dispersion and individualization of data are problems due to a lack of centralized data management and analysis tool development. To solve these problems, NARO started a new project to optimize resource utilization to maximize research results by consolidating human resources, funds, and advanced analysis with state-of-the-art devices. In this project, NARO tries to construct a pipeline system to analyze omics information by linking high-performance computing resources at ?rst because genomics is one of the essential data in the bioresearch ?eld. The high-performance computing resources consist of the genome analysis computer utilized in an organization in NARO and “SHIHO”, the supercomputer. Moreover, genome data for those resources and analyzed data from the resources are stored with suitable metadata in NARO Linked DB. NARO Linked DB is an integrated database for sharing research data across each organization in NARO. This way，by connecting these resources, the pipeline system can provide the function to analyze and search genome data across each organization in NARO with genome-speci?ed metadata. To realize this pipeline, we develop DRA (DDBJ Sequence Read Archive) based metadata input/output systems on the NARO Linked DB. In this paper, we describe the detail of this metadata input/output system and the current status and issues of the pipeline system development.

Corresponding author

Register with J-STAGE for free!