The genetic information in the genome of Eucalyptus camaldulensis was investigated by sequencing the genome and the cDNA using a combination of the conventional Sanger method and next-generation sequencing methods, followed by intensive bioinformatics analyses. The total length of the non-redundant genomic sequences thus obtained was 654,922,307 bp consisting of 81,246 scaffolds and 121,194 singlets. These sequences accounted for approximately 92% of the gene-containing regions with an average G+C content of 33.6%. A total of 77,121 complete and partial structures of protein-encoding genes have been deduced. Comparison of the genes mapped on the KEGG pathways or located in the KOG classification with those in other plant species revealed the characteristics of the E. camaldulensis genome, and it was found that 23 pathways contained enzymes present only in the E. camaldulensis genome. Polymorphism analysis using microsatellite markers developed from the genomic sequence data obtained was performed with six Eucalyptus species collected from various parts of the world to estimate their genetic diversity, and the usefulness of these markers was demonstrated. The genomic sequence and accompanying information presented here are expected to serve as valuable resources for the acceleration of fundamental and applied research with Eucalyptus, especially in the fields of paper production and industrial materials. Further information on the genomic and cDNA sequences and microsatellite markers is available at http://www.kazusa.or.jp/eucaly/.
2011 by Japanese Society for Plant Cell and Molecular Biology