Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Association pattern mining of intron retention events in human based on hybrid learning machine
Hae-Jin HuSung-Ho GohYeon-Su Lee
著者情報
ジャーナル フリー HTML
電子付録

2010 年 85 巻 6 号 p. 383-394

詳細
抄録

Alternative splicing is a main component of protein diversity, and aberrant splicing is known to be one of the main causes of genetic disorders such as cancer. Many statistical and computational approaches have identified several major factors that determine the splicing event, such as exon/intron length, splice site strength, and density of splicing enhancers or silencers. These factors may be correlated with one another and thus result in a specific type of splicing, but there has not been a systematic approach to extracting comprehensible association patterns. Here, we attempted to understand the decision making process of the learning machine on intron retention event. We adopted a hybrid learning machine approach using a random forest and association rule mining algorithm to determine the governing factors of intron retention events and their combined effect on decision-making processes. By quantifying all candidate features into five category values, we enhanced the understandability of generated rules. The interesting features found by the random forest algorithm are that only the adenine- and thymine-based triplets such as ATA, TTA, and ATT, but not the known intronic splicing enhancer GGG triplet is shown the significant features. The rules generated by the association rule mining algorithm also show that constitutive introns are generally characterized by high adenine- and thymine-based triplet frequency (level 3 and above), 3' and 5' splice site scores, exonic splicing silencer scores, and intron length, whereas retained introns are characterized by low-level counterpart scores.

著者関連情報
© 2010 by The Genetics Society of Japan
前の記事 次の記事
feedback
Top