2002 年 42 巻 5 号 p. 224-229
Having a huge amount of genome sequence data, we have aimed to analyze the data and provide reliable information extracted for biological scientists. Focusing on the protein structure prediction, we have so far analyzed 70 organisms whose complete genomes were sequenced. The main tool employed was PSI-BLAST, a homology search method much more powerful than the conventional ones such as FASTA/BLAST. The fraction of all ORFs in a genome, predictable of their 3D structure, turned out to be as high as 40-50%. All the data analyzed were compiled in a database called GTOP(http://spock.genes.nig.ac.jp/~genome/gtop.html). As an application study of GTOP, a way to identify a significant number of pseudogenes in E. coli is also described.