An outline of an informatical method for identifying the complete set of genes using the DNA sequence of a whole genome

Masashi SUZUKI

doi:10.2183/pjab.75.81

Abstract

The identification of open reading frames (ORFs) by using the DNA sequence of a whole genome involves a statistical process to separate candidates-i.e. sections that start with formal start colons and end with formal termination colons, into two groups, authentic ORFs and artifacts. A small number of genes known prior to the study can be used for the analysis of general informatical characteristics that are expected to be shared by all the ORFs present in the genome. The results can be summarized into the form of scoring systems that measure the relatedness of each candidate to the model ORE In order to identify the complete set of ORFs the rate of false negative identification needs to be minimized, so that no important ORE is missed. A number of non-ORE sections can be analyzed by the same systems in order to estimate the rate of false positive identification. This rate can be systematically reduced by combining multiple scoring systems that evaluate different ORE-specific characteristics.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!