IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Protein Fold Recognition with Representation Learning and Long Short-Term Memory
Masashi TsubakiMasashi ShimboYuji Matsumoto
著者情報
ジャーナル フリー

2017 年 10 巻 p. 2-8

詳細
抄録

Predicting the 3D structure of a protein from its amino acid sequence is an important challenge in bioinformatics. Since directly predicting the 3D structure is hard to achieve, classifying a protein into one of the “folds”, which are pre-defined structural labels in protein databases such as SCOP and CATH, is generally used as an intermediate step to determine the 3D structure. This classification task is called protein fold recognition (PFR), and much research has addressed the problem of either (i) feature extractions from amino acid sequences or (ii) classification methods of the protein folds. In this paper, we propose a new approach for PFR with (i) learning feature representations with unsupervised methods from a large protein database instead of manual feature selection and using external tools. (ii) learning deep neural architectures, recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and re-training the representations instead of fixing the extracted features. On a benchmark dataset, our approach outperforms existing methods that use various physicochemical features.

著者関連情報
© 2017 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top