2022 Volume 21 Issue 1 Pages 20-32
In eukaryotic genome sequences, there are exons that are translated into proteins, and introns that are not. It is important task to estimate the functional sites in the genome sequence. In the other hand, it is well known that the amino acid sequence of a protein is closely related to its function. This is especially true for particular structural features called motifs, and they are considered to be well reserved sites in the genome sequence. In this work, we have developed the Dynamic Programming (DP) -based functional site estimation system using the codon reduced representation and its approximation. The motif codon reduced representation has been also proposed based on codon weight matrix which is represented frequency of each nucleotide corresponding to the amino acid sequence in PROSITE motif. PROSITE is public database of motif dictionary. Our system successfully estimated Coding sequence (CDS) region in TNNC1 genome sequence of Human. The experiments were also executed using EF-hand motif in TNNC1 and HPCA genome sequence of several model species including Human. These results show the potential applicability of our approach for the functional sites in genome sequence.