Recent developments in computer technology and progress in information science continue to provide powerful information processing technologies for extracting useful information at the maximum extent from an available dataset. They are called data-driven analyses and applied in various natural sciences, engineering fields and industry. In this paper, I briefly explain the concepts of Bayesian estimation and sparse modeling, which are considered as fundamental mathematical schemes of data-driven analysis, and overview the application examples which have been studied by the author’s group in geology. For data analysis on rock textures, Bayesian estimation is effective, because it can introduce a priori knowledge about spatiotemporal structure and physicochemical model objectively in inversion analysis. For analysis of geochemical data, sparse modeling is particularly effective, because it can automatically extract essential combination of elements and/or governing earth-scientific processes from a high-dimensional dataset. Collaboration between information scientists and geologists leads to the development of new powerful methods for specific and general problems in geology, and the importance of data-driven analysis will increase in the future.
A model consisting of simple geology is analyzed mathematically in order to understand improvement of accuracy with progress of geological survey. The model is to identify geologic units along a line segment [0, 1]. At Step 1, the midpoint of the segment is identified geologically. At Step 2, the segment is divided into three sub-segments, and the midpoint of each sub-segment is identified geologically. At Step 3, each sub-segment is divided into subordinate subsegments, each of which is also identified. This process is continued to Step K, when it is assumed to identify the geology along the initial segment with necessary and sufficient accuracy. Cell entropy sk/K and variance Vk/K, which are calculated at each step k for the final step K, can indicate decrease of uncertainty in progressive survey. Especially, the cases using almost binary abundance values suggest that geologist should focus on the survey of high entropy (large variance) cells at the next step.