2019 年 39 巻 2 号 p. 55-84
Nowadays, many methods that employ the 16S ribosomal RNA gene (16S rRNA sequencing data) have been proposed for the analysis of gut microbial compositional data. 16S rRNA sequencing data is statistically multivariate count data. When multivariate data analysis methods are used for association analysis with a disease, 16S rRNA sequencing data is generally normalized before analysis models are fitted, because the total sequence read counts of the subjects are different. However, proper methods for normalization have not yet been discussed or proposed. Rarefying is one such normalization method that equals the total counts of subjects by subsampling a certain amount of counts from each subject. It was thought that if rarefying were combined with ensemble learning, performance improvement could be achieved. Then, we proposed an association analysis method by combining rarefying with ensemble learning and evaluated its performance by simulation experiment using several multivariate data analysis methods. The proposed method showed superior performance compared with other analysis methods, with regard to the identification ability of response-associated variables and the classification ability of a response variable. We also used each evaluated method to analyze the gut microbial data of Japanese people, and then compared these results.