Abstract
It is well known that the amino acid sequence of a protein is closely related to its structure and function. In the present work, the authors have proposed a representation of protein sequence features based on combination of a target residue and its symmetric neighborhood residues, and constructed the comprehensive palindrome fragment database based on NCBI RefSeq (HUMAN) which contained 35,665 entries (the total number of residues: 19,864,982). All palindrome fragments for each size were hierarchically stored with the information of the target residue, the total number of occurrences, the number of entries, and the number of constituted amino acid type. For size-5 fragment such as “PPGPP”, 6,213 / 8,000 patterns (total 157,481 fragments) were obtained from the database. Sequence feature analysis was performed using the PROSITE motif patterns and a frequency of each fragment in the database. As a result, we successfully found the characteristic palindrome fragment which is closely related to the common protein family.