29 巻 (2014) 4 号 p. 356-363
Researchers of agriculture, life science and drug design of the need to acquire information that combines two or more life science databases for problem solving. Semantic Web technologies are already necessary for data integration between those databases. This study introduces a technique of utilizing RDF (Resource Description Framework) and OWL (Web Ontology Language) as a data set for development of a machine learning predictor of interactomics. Also, for SPARQL (SPARQL Protocol and RDF Query Language) we sketched the implementing method of interactomics LOD (Linked Open Data) in the graph database. Interactomics LOD has included the pairs of protein--protein interactions of tyrosine kinase, the pairs of amino acid residues of sugar (carbohydrate) binding proteins, and cross-reference information of the protein chain among an entry of major bioscience databases since 2013. Finally, we designed three RDF schema models and made access possible using AllegroGraph 4.11 and Virtuoso 7. The number of total triples was 1,824,859,745 in these databases. It could be combined with public LOD of the life science domain of 28,529,064,366 triples and was able to be searched. We showed that it was realistic to deal with large-scale LOD on a comparatively small budget by this research. The cost cut by LOD decreased not only expense but development time. Especially RDF-SIFTS (Structure Integration with Function, Taxonomy and Sequence) that is an aggregate of 10 small LOD was constructed in the short period of BioHackathon 2013 or was developed in one week. We could say that we can obtain quickly a data set required for the machine learning of interactomics by using LOD. We set up the interactomics LOD for application development as a database. SPARQL endpoints of these databases are exhibited on the portal site UTProt (The University of Tokyo Protein, http://utprot.net).