抄録
We have developed a coding region prediction system. It is constructed from several measures that indicate exonness of a region in DNA sequence. The system includes a new statistical measure called secondary hexamer measure which we have developed. In addition to the measure, several measures are combined by two-dimensional linear discriminant analysis (2D-LDA). Then the system outputs a best gene model, that is a model with the best score accumulated by phase-specific dynamic programming. Our test of this program on 568 vertebrate complete gene sequences had 61% accuracy at exon level for exact match and 95% accuracy at nucleotide level. The average correlation coefficient (CC) between prediction and actual structure was 0.80.