Abstract
We classified more than 2,300 ATH1 GeneChips, and found that co-expression linkage tends to intensify with dimensional compression of data matrix using singular value decomposition (SVD). This indicates that the process contributes effectively to the functional prediction of unknown genes. Indeed, the data matrix reconstructed by using up to 40 singular values was sufficient to reproduce the correlations in the fundamental circadian clock and the regulatory relationship of PMG1. The correspondence between singular values and arrays also supported the importance of different tissue types: shoot, root, and stamen (other than leaf) contributed to largest singular values.
The predictive reliability of co-expression relationship depends on the quality and the diversity of dataset selected. So far, little evaluation study has been performed to compare online repositories, while many groups provided co-expressed gene lists based on different dataset and similarity measures. We believe that our evaluation will be helpful for designing transcriptome analysis.