Abstract
Phylogenetic profiling is used to find lineage-specific, conserved proteins in a set of sequenced genomes, and thus helps functional annotation of genomic data. I have developed a clustering software called Gclust, which uses the output of all-against-all BLASTP results, and outputs protein clusters, analogous to the COG of NCBI. In contrast with the COG, one can customize the dataset to infer clusters in the Gclust. In addition, datasets including both prokaryotes and eukaryotes can be processed. Such clusters are used for phylogenetic profiling. The Gclust server (http://gclust.c.u-tokyo.ac.jp/) was designed to provide all biologists with the ease in phylogenetic profiling. One can find easily all protein clusters that are conserved in a selected set of organisms. I will present some examples, such as the chloroplast proteins of endosymbiont origin, and the conserved proteins in nitrogen-fixing symbiotic bacteria.