Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
A Novel Bioinformatic Strategy for Unveiling Hidden Genome Signatures of Eukaryotes: Self-Organizing Map of Oligonucleotide Frequency
Takashi AbeYuta IchibaShigehiko KanayaTokio KozukiMakoto KinouchiToshimichi Ikemura
Author information

2002 Volume 13 Pages 12-20


With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, Kohonen's self-organizing map (SOM), to analyze diand trinucleotide frequencies in 9 eukaryotic genomes of known sequences (a total of 1.2 Gb); S. cerevisiae, S. pombe, C. elegans, A. thaliana, D. melanogaster, Fugu, and rice, as well as P. falciparum chromosomes 2 and 3, and human chromosomes 14, 20, 21, and 22, that have been almost completely sequenced. Each genomic sequence with different window sizes was encoded as a 16-and 64-dimensional vector giving relative frequencies of di- and trinucleotides, respectively. From analysis of a total of 120, 000 nonoverlapping 10-kb sequences and overlapping 100-kb sequences with a moving step size of 10 kb, derived from a total of the 1.2 Gb genomic sequences, clear species-specific separations of most sequences were obtained with the SOMs. The unsupervised algorithm could recognize, in most of the 120, 000 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature representations of each genome. Because the classification power is very high, the SOMs can provide fundamental bioinformatic strategies for extracting a wide range of genomic information that could not otherwise be obtained.

Content from these authors
© Japanese Society for Bioinformatics
Previous article Next article