2022 Volume 33 Issue 2 Pages 94-103
Gene expression data refers to a numerical matrix in which genes are arranged in each row and samples in each column, and each element stores numerical information on the amount of expression. In this paper, we use the numerical matrix which measured the acid stress response of Lactobacillus rhamnosus GG, consisting of 2,949 genes × 9 samples. We first describe the basic usage of RStudio, an integrated development environment for R. Next, we discuss the significance of clustering samples having similar expression patterns, and the interpretation of the results. For gene clustering of RNA-seq data, we outline MBCluster.Seq, a representative package for the purpose. Finally, we introduce a modified version of the package (called MBCdeg) that can also be used for detecting differentially expressed genes (DEGs). Supplementary materials are available online at https://www.iu.a.u-tokyo.ac.jp/kadota/r_seq2.html.