Abstract
With the advances of whole genome shotgun sequencing technique, many organisms have been sequenced and many more will be. However, the genome annotation, what gene are where on the genome, is yet a though process.
The draft genome of the moss Physcomitrella patens have been published with nearly 36 000 gene models (Rensing et al 2009). However, only 1/3 of the gene models had experimental support. We collected various transcript sequence data including full-length cDNA sequences, 3' EST with 454 system, and 5' SAGE data with oligo-capping methods to have a better annotation. Furthermore, we collected over 360 million 25-nt reads of mRNA 5' enriched tags and over 140 million 50-nt reads of equalized cDNA using the SOLiD system (Lifetechnologies).
The 50-nt reads were aligned with the reference genome, and we found that the intron can be recognized. We constructed a prototype system to construct gene models using these data. We are developing this system to combine other data and process whole genome.