IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
A Combined Approach for de novo DNA Sequence Assembly of Very Short Reads
Wisnu Ananta KusumaTakashi IshidaYutaka Akiyama
Author information
JOURNAL FREE ACCESS

2011 Volume 4 Pages 21-33

Details
Abstract

De novo DNA sequence assembly is very important in genome sequence analysis. In this paper, we investigated two of the major approaches for de novo DNA sequence assembly of very short reads: overlap-layout-consensus (OLC) and Eulerian path. From that investigation, we developed a new assembly technique by combining the OLC and the Eulerian path methods in a hierarchical process. The contigs yielded by these two approaches were treated as reads and were assembled again to yield longer contigs. We tested our approach using three real very-short-read datasets generated by an Illumina Genome Analyzer and four simulated very-short-read datasets that contained sequencing errors. The sequencing errors were modeled based on Illumina's sequencing technology. As a result, our combined approach yielded longer contigs than those of Edena (OLC) and Velvet (Eulerian path) in various coverage depths and was comparable to SOAPdenovo, in terms of N50 size and maximum contig lengths. The assembly results were also validated by comparing contigs that were produced by assemblers with their reference sequence from an NCBI database. The results show that our approach produces more accurate results than Velvet, Edena, and SOAPdenovo alone. This comparison indicates that our approach is a viable way to assemble very short reads from next generation sequencers.

Content from these authors
© 2011 by the Information Processing Society of Japan
Previous article Next article
feedback
Top