Abstract
We present a computational method for prediction of functional modules that can be directly applied to the newly sequenced microbial genomes for predicting gene functions and the component genes of biological pathways. We first quantify the functional relatedness among genes based on their distribution (i. e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness. We then apply a threshold-based clustering algorithm to this gene network, and obtain modules for each of which the number of genes is bounded from above by a pre-specified value and the component genes are more strongly functionally related to each other than genes across the predicted modules. Particularly, when the module size is bounded by 130, we obtain 167 functional modules covering 813 genes for Escherichia coli K12, and 138 functional modules covering 731 genes for Bacillus subtilis subsp. subtilis str. 168. We have used the gene ontology (GO) information to assess the prediction results. The GO similarities among the genes of the same functional module are compared with the GO similarities among the genes that are randomly clustered together. This comparison reveals that our predicted functional modules are statistically and biologically significant, and the genes of the same functional module share more commonality in terms of biological process than in terms of molecular function or cellular component. We have also examined the predicted functional modules that are common to both Escherichia coli K12 and Bacillus subtilis subsp. subtilis str. 168, and provide explanations for some functional modules.