Journal of Computer Chemistry, Japan
Online ISSN : 1347-3824
Print ISSN : 1347-1767
ISSN-L : 1347-1767
General Paper
The Multiple Representation of Protein Sequence MotifsUsing Sequence Binary Decision Diagrams
Kohei YAMATOHiroaki KATOTetsuo KATSURAGIYoshimasa TAKAHASHI
Author information
JOURNALS FREE ACCESS FULL-TEXT HTML

2020 Volume 19 Issue 1 Pages 8-17

Details
Abstract

In this paper, we describe a multiple representation method of protein sequence motifs using sequence binary decision diagrams (SeqBDDs) and their application for motif search. A SeqBDD is a compressed representation of a set of sequences such as multiple strings. In this study, we developed two algorithms for SeqBDDs. The first is for constructing a SeqBDD which expresses amino acid sequences of the corresponding motifs, and the second is for building an automaton equivalent to a deterministic finite automaton for a SeqBDD by adding state transition to it. For the evaluation of their performances, we used our method to search for three highly conserved domains in the matrix metalloproteinase (MMP) family against all 555,594 amino acid sequences obtained from UniProtKB/Swiss-Prot (Release 2017_09) and compared results with the similar searches using PROSITE patterns. The methods showed better results on precision, recall and F-measure than those of using PROSITE patterns.

Fullsize Image
Information related to the author
© 2020 Society of Computer Chemistry, Japan
Previous article Next article
feedback
Top