IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
A Combined Approach for de novo DNA Sequence Assembly of Very Short Reads
Wisnu Ananta KusumaTakashi IshidaYutaka Akiyama
著者情報
ジャーナル フリー

2011 年 4 巻 p. 21-33

詳細
抄録

De novo DNA sequence assembly is very important in genome sequence analysis. In this paper, we investigated two of the major approaches for de novo DNA sequence assembly of very short reads: overlap-layout-consensus (OLC) and Eulerian path. From that investigation, we developed a new assembly technique by combining the OLC and the Eulerian path methods in a hierarchical process. The contigs yielded by these two approaches were treated as reads and were assembled again to yield longer contigs. We tested our approach using three real very-short-read datasets generated by an Illumina Genome Analyzer and four simulated very-short-read datasets that contained sequencing errors. The sequencing errors were modeled based on Illumina's sequencing technology. As a result, our combined approach yielded longer contigs than those of Edena (OLC) and Velvet (Eulerian path) in various coverage depths and was comparable to SOAPdenovo, in terms of N50 size and maximum contig lengths. The assembly results were also validated by comparing contigs that were produced by assemblers with their reference sequence from an NCBI database. The results show that our approach produces more accurate results than Velvet, Edena, and SOAPdenovo alone. This comparison indicates that our approach is a viable way to assemble very short reads from next generation sequencers.

著者関連情報
© 2011 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top