Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites
Fanfan ZengRoland H.C. YapLimsoon Wong
Author information

2002 Volume 13 Pages 192-200


Correct prediction of the translation initiation site (TIS) is an important issue in genomic research. We show that feature generation together with correlation based feature selection can be used with a variety of machine learning algorithms to give highly accurate translation initiation site prediction. Only very few features are needed and the results achieve comparable accuracy to the best existing approaches. Our approach has the advantage that it does not require one to devise a special prediction method; rather standard machine learning classifiers are shown to give very good performance on the selected features. The raw and generated features which we have found to be important are the following: positions-3 and-1 in the sequence; upstream k-grams for k=3, 4, and 5; stop-codon frequency; downstream in-frame 3-gram; and the distance of ATG to the beginning of the sequence. The best result, with an overall accuracy of 90%, is obtained by selecting only seven features from this set. The same features retrained with the use of a scanning model achieves an overall accuracy of 94% on this dataset.

Information related to the author
© Japanese Society for Bioinformatics
Previous article Next article