Abstract
Using the 16S rRNA sequence analysis, which analyzes the 16S rRNA region of the whole microbial genome, compositional data of microbial species can be obtained nowadays. As an analysis method for these data, the latent Dirichlet allocation (LDA) model has been proposed as a dimension reduction method.
Microbiome data from the 16S rRNA sequence analysis are often measured in time series to observe the changes in the microbial environment of a subject over time. As an LDA model for time-series data, the dynamic topic model (DTM) is often used. Although the number of topics need to be pre-specified when using the DTM, the number of topics from the data may be automatically deduced by extending the DTM model to a Bayesian nonparametric model. Therefore, a Bayesian nonparametric topic model for microbiome data measured in time series was proposed and compared to the DTM using real microbiome data. As a result, using the proposed model, the topic proportions of only a few topics became averagely large regardless of the pre-specified number of topics. In addition, the number of topics whose proportion became the largest for any subject did not change depending on the pre-specified number of topics. Therefore, it was suggested that the number of topics from microbiome data could be automatically decided using this proposed model.