2012 Volume 2012 Issue DOCMAS-B102 Pages 03-
We propose a method of estimating latent structures behind string data in order to apply them to knowledge discovery with giving problems. Our goal is to learn the structures and to represent them as models of data generating processes. For the models of generating processes, we introduce two concepts, components and assemblers with patterns and refinements of patterns, where we adopt patterns to represent templates of patterns and refinements to represent generation of them. A component is defined as a 3-tuple with a pattern, a set of substitutions, and a probability density function on the set. An assembler is a function to construct string data with sub-strings generated by components. In this paper, we propose three sub-problems and algorithms to solve one of them, and also present some simple experimental results towards our goal.