2013 年 33 巻 2 号 p. 125-143
Definition of similarity is required for clustering co-expressed genes or estimating gene regulatory network from gene expression data. Pearson correlation coefficient and mutual information are the popular measures to evaluate similarity between gene expression profiles. To investigate which measure is appropriate for evaluating similarity between gene expression profiles, we have compared these two measures using Gene ontology annotation similarity. Genes that have similar Gene ontology annotations can be interpreted that they have commonality in biological processes or molecular functions. The results showed that the better similarity measure is different depending on the purpose of the analysis or from which organism the data derived. In the case of evaluating similarities among more than three genes, mutual information was a better similarity measure for the data derived from multicellular organisms, though Pearson correlation coefficient was a better similarity measure for the data derived from unicellular organisms. In the case of finding genes whose transcripts have similar functions or genes that participate to similar processes, Pearson correlation coefficient was always a better measure.