日本計算機統計学会シンポジウム論文集
Online ISSN : 2189-583X
Print ISSN : 2189-5813
ISSN-L : 2189-5813
会議情報
N-gram spectral analysis on genomes and biological language models(Session 7C(IASC-ARS))
Yong Kheng GOHFoo Weng LIMYean Ling LEO
著者情報
会議録・要旨集 フリー

p. 199-202

詳細
抄録
It has been suggested previously that genomic sequences show characteristics typical of natural-language texts such as characteristics "signature" word usage. Thus, the algorithms from natural-language processing may therefore be used in genomic sequences as a mean of clustering and building of phylogenetic tree. Following this approach of natural-language processing, statistical N-gram analysis has been applied for comparative analysis of whole genome sequences of several organisms. It could be shown that a few particular DNA N-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. By counting the occurrences of different N-grams, one can define signature vectors of a genetic text, such as contrast value and usage departure. From the contrast value vectors, phylogenetic trees could be built by using linguistic similarity measures like correlation.
著者関連情報
© 2012 日本計算機統計学会
前の記事 次の記事
feedback
Top