Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Efficient Determination of Cluster Boundaries for Analysis of Gene Expression Profile Data Using Hierarchical Clustering and Wavelet Transform
Harry Amri MoesaBahadur K.C. DukkaTatsuya Akutsu
Author information
JOURNAL FREE ACCESS

2005 Volume 16 Issue 1 Pages 132-141

Details
Abstract

The existing methods for clustering of gene expression profile data either require manual inspection and other biological knowledge or require some cut-off value which can not be directly calculated from the given data set. Thus, the problem of systematic and efficient determination of cluster boundaries of clusters in gene expression profile data still remains demanding.
In this context, we have developed a procedure for automatic and systematic determination of the boundaries of clusters in the hierarchical clustering of gene expression data based on the ratio of with-in class variance and between-class variance, which can be fully calculated from the given expression data. After the determination of dendrogram based on agglomerative hierarchical clustering, this ratio is used to determine the cluster boundary. Except this ratio which can be completely calculated from the given expression profile data, unlike other existing approaches, our approach does not require any manual inspection or biological knowledge. Our results are favorably comparable and in some of cases better than existing method which does not utilize prior information or manual inspection. Moreover, gene expression profile data are often contaminated with various type of noise and in order to reduce this noise content, we have also applied image enhancing technique called discrete wavelet transform. We tested a number of mother wavelet functions to smooth the noise in the gene expression data set and obtained some improvements in the quality of the results.

Content from these authors
© Japanese Society for Bioinformatics
Previous article Next article
feedback
Top