Nowadays, many methods that employ the 16S ribosomal RNA gene (16S rRNA sequencing data) have been proposed for the analysis of gut microbial compositional data. 16S rRNA sequencing data is statistically multivariate count data. When multivariate data analysis methods are used for association analysis with a disease, 16S rRNA sequencing data is generally normalized before analysis models are fitted, because the total sequence read counts of the subjects are different. However, proper methods for normalization have not yet been discussed or proposed. Rarefying is one such normalization method that equals the total counts of subjects by subsampling a certain amount of counts from each subject. It was thought that if rarefying were combined with ensemble learning, performance improvement could be achieved. Then, we proposed an association analysis method by combining rarefying with ensemble learning and evaluated its performance by simulation experiment using several multivariate data analysis methods. The proposed method showed superior performance compared with other analysis methods, with regard to the identification ability of response-associated variables and the classification ability of a response variable. We also used each evaluated method to analyze the gut microbial data of Japanese people, and then compared these results.
In oncology, next generation sequencing and comprehensive genomic profiling have enabled detailed classification of tumors using molecular biology. It, however, may be unrealistic to conduct phase I-III trials according to each subpopulation based on the molecular subtypes. Common protocols that assess the combination of several molecular markers and their targeted therapies by means of multiple sub-trials are required. These protocols are called “master protocols,” and are drawing attention as a next-generation clinical trial design. In this review, we provide an overview of clinical trials based on master protocol including basket, umbrella, and platform trials along with their recent examples. We also discuss the statistical challenges encountered in their application.
A basket trial in oncology often enrolls patients with a particular molecular status, such as biomarker or gene alterations among cancer types, and evaluates the efficacy and safety of the targeted therapy to the corresponding molecular characteristic across cancer types. Because of the limited number of patients enrolled for each cancer type, statistical inference for treatment effects using such sparse data and evaluation of its homogeneity across cancer types can be challenging. A hierarchical Bayesian model shrinks the effect of the targeted therapy for each cancer type to the global effect under the assumption that the effects among cancer types are inherently exchangeable and correlated. The exchangeability and non-exchangeability model, which is an extension of the hierarchical Bayesian model, allows each cancer-specific parameter to be exchangeable with other similar cancer-type parameters, or to be non-exchangeable with any of them. Apart from these hierarchical modeling methods, an approach based on Bayesian model averaging for testing the effectiveness/ineffectiveness of treatment for each cancer type has been proposed. In this paper, we provide an overview these statistical methodologies along with a software application.