2020 Volume 19 Issue 1 Pages 8-17
In this paper, we describe a multiple representation method of protein sequence motifs using sequence binary decision diagrams (SeqBDDs) and their application for motif search. A SeqBDD is a compressed representation of a set of sequences such as multiple strings. In this study, we developed two algorithms for SeqBDDs. The first is for constructing a SeqBDD which expresses amino acid sequences of the corresponding motifs, and the second is for building an automaton equivalent to a deterministic finite automaton for a SeqBDD by adding state transition to it. For the evaluation of their performances, we used our method to search for three highly conserved domains in the matrix metalloproteinase (MMP) family against all 555,594 amino acid sequences obtained from UniProtKB/Swiss-Prot (Release 2017_09) and compared results with the similar searches using PROSITE patterns. The methods showed better results on precision, recall and F-measure than those of using PROSITE patterns.