Recently, it is popular for museums, libraries, and broadcasting companies to digitalize and archive their contents. Once those digitized contents are on-line, it is believed that they are utilized by many people for many purposes. However, due to the difference of the archived content formats and API of existing digital archive systems, it is difficult to utilize the contents across the organizations. We propose an open digital archive system. The openness of system has two meanings; the ability to access to the contents managed by different organizations by a single way, and the ability for third parties to create new machine learning based search systems with any types of feature values. Our open digital archive system introduces a global unique identifier for each of digital archive contents and implemented a data format based on OAIS model that can be used for any types of files. In addition, we introduced a mechanism for magnetic tape storage devices. Finally, we confirmed that feature value extraction processing works well in our system and the read and write throughput of archive format content achieves about 130 MB/s and 200 MB/s respectively.
View full abstract