Abstract
We present our pipeline for analysis of short reads generated by Illumina Genome Analyzer. The pipeline supports multiple mapping programs, and outputs mapping results in multiple formats. In addition, we developed a short-read viewer to browse a large number of DNA fragments that mapped to a reference genome. The viewer displays SNPs, indels, and possible sequencing errors on exons, introns and coding frames of the known gene structures. In parallel with the mapping results, the pipeline reports novel gene structures, which are predected by the Bowtie, TopHat and Cufflinks programs on the basis of a short-read profile. Short-read mapping results as well as the known and predicted gene structures were used to estimate normalized expression levels (number of Reads Per Kilobase of the exon models per Million of the mapped reads; RPKMs) of the gene structures as their gene expression levels. We further assessed that Illumina's mRNA-seq data showed high technical reproducibility, and investigated how many short-reads were required for estimation of accurate expression levels. Detection of differentially expressed rice genes under abiotic stress is discussed.