Abstract
IT systems are widely used at medical institutions in Japan today. As a result, medical information has been increasing and its secondary use has become a significant issue.
Large-scale data processing is considered to be a requirement for the secondary use of medical information. The requirements for large-scale data processing are as follows: (1) collect and store data from many medical institutions, (2) build highly scalable systems that enable the use of data that increase over a long period of time, and (3) establish security for processing large amounts of data.
Taking these requirements into consideration, we have designed and implemented a cloud system architecture which adopts a key-value datastore such as Cassandra and a distributed processing system such as Hadoop. By using Cassandra, retrieval performance showed adequate performance compared with existing RDBMS without a slowdown from increasing data. We have also achieved about 63% to 66% reduction of initial construction costs while preserving scalability in data capacity.
We report that our implementation achieves an architecture with higher sustainability than ever before in large-scale data processing.