2023 Volume 2023 Issue SWO-061 Pages 06-
In the field of life sciences, numerous databases are publicly available in the form of graph structures called RDF (Resource Description Framework). Each node in these graphs is linked to a unique URI representing entities such as genes and proteins. Consequently, these databases contain vast amounts of graph-structured data with numerous URIs. By establishing links to URIs in other databases, these databases become interconnected through links, ultimately forming a massive knowledge graph. As a result, it is essential to develop a foundational infrastructure for storing and searching a significant number of URIs from life science data in this vast knowledge graph, utilizing database IDs and keywords. Presently, Front Coding is commonly employed for URI compression, but it suffers from a limitation in its inability to perform prefix matching searches. Trie is frequently used as an alternative approach capable of retrieving prefixes. This study assesses the compression ratio and search efficiency by comparing the utilization of Front Coding, Trie, and our proposed methods. The evaluation is conducted using a dataset of URIs extracted from life-science RDF datasets.