References

Ausiello, G., Gherardini, P. F., Marcatili, P., Tramontano, A., Via, A., and HelmerCitterich, M. (2008) FunClust: A web server for the identification of structural motifs in a set of non-homologous protein structures. BMC Bioinformatics 9, S2.

Bailey, T., and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80.

Che, D., Song, Y., and Rashed, K. (2005) MDGA: Motif discovery using a genetic algorithm. In: Proceedings of Genetic and Evolutionary Computation. pp. 447–452. ACM Press, New York.

Das, M. K., and Dai, H. K. (2007) A survey of DNA motif finding algorithms. BMC Bioinformatics 8, S21.

Eskin, E., and Pevzner, P. (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18, 354–363.

Fogel, G., Weekes, D., Varga, G., Dow, E., Harlow, H., Onyia, J., and Su, C. (2004) Discovery of sequence motifs related to co-expression of genes using evolutionary computation. Nucleic Acids Res. 32, 3826–3835.

Gelfand, M., Koonin, E., and Mironov, A. (2000) Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucleic Acids Res. 28, 695–705.

Gertz, J., Riles, L., Turnbaugh, P., Ho, S., and Cohen, B. (2007) Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. Genome Res. 15, 1145–1152.

Goldberg, D. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Boston.

GuhaThakurta, D., and Stormo, G. (2001) Identifying target sites for cooperatively binding factors. Bioinformatics 17, 608–621.

Hertz, G., and Stormo, G. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577.

Hughes, J., Estep, P., Tavazoie, S., and Church, G. (2000) Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biology 296, 1205–1214.

Jonassen, I., Collins, J., and Higgins, D. (1995) Finding flexible patterns in unaligned protein sequences. Protein Sci. 4, 1587–1595.

Klepper, K., Sandve, G. K., Abul, O., Johansen, J., and Drablos, F. (2008) Assessment of composite motif discovery methods. BMC Bioinformatics 9, 123.

Lawrence, C., Altschul, S., Bogusky, M., Liu, J., Neuwald, A., and Wootton, J. (1993) Detecting subtle sequence signals, Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

Liu, F., Tsai, J., Chen, R., and Shih, S. (2004) FMGA: Finding motifs by genetic algorithm. In: Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. pp. 459–466. IEEE Computer Society Press, Los Alamitos.

Liu, X., Brutlag, D., and Liu, J. (2001) Bioprospector: Discovering conserved DNA motif in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 6, 127–138.

Marsan, L., and Sagot, M. (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360.

Paul, T., and Iba, H. (2006) Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proceedings of the 8th annual conference on Genetic and Evolutionary Computation. pp. 271–278. ACM Press, New York.

Pavesi, G., Mauri, G., and Pesole, G. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, 207–214.

Pevzner, P. (2000) Computational Molecular Biology: An Algorithmic Approach. The MIT Press, Massachusetts.

Pevzner, P., and Sze, S. (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. pp. 269–278. AAAI Press, Menlo Park.

Rigoutsos, I., and Floratos, A. (1998) Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14, 55–67.

Sandve, G. K., and Drablös, F. (2006) A survey of motif discovery methods in an integrated framework. Biol. Direct 1, 11.

Sinha, S., and Tompa, M. (2003a) Performance comparison of algorithm for finding transcription factor binding sites. In: Proceedings of the 3rd IEEE Symposium on Bioinformatics and Bioengineering. pp. 214–220. IEEE Computer Society Press, New York.

Sinha, S., and Tompa, M. (2003b) YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586–3588.

Stine, M., Dasgupta, D., and Mukatira, S. (2003) Motif discovery in upstream sequences of coordinately expressed genes. Evol. Comput. 3, 1596–1603.

Stormo, G. (2000) DNA binding sites: Representation and discovery. Bioinformatics 16, 16–23.

Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moore, B., Rouze, P., and Moreau, Y. (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122.

Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144.

van Helden, J., Rios, A., and Collado-Vides, J. (2000) Discovery regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818.

Wei, Z., and Jensen, S. (2006) GAME: Detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584.

Wijaya, E., Rajaraman, K., Yiu, S., and Sung, W. (2007) Detection of generic spaced motifs using submotif pattern mining. Bioinformatics 23, 1476–1485.

Yuh, C., Bolouri, H., and Davidson, E. (1998) Genomic cis-regulatory logic: Experimental and computational analysis of a sea urchin gene. Science 279, 1896–1902.

Zhu, J., and Zhang, M. (1999) SCPD: A promoter database of yeast Saccharomyces cerevisiae. Bioinformatics 15, 563–577.

INTRODUCTION

DEFINITIONS AND NOTATIONS

GENETIC ALGORITHM

Population pool

Fitness function

Function

Function

Function

Crossover and mutation operators

Termination of the GA-DPAF algorithm

Time and space analysis

Implementation

EXPERIMENTAL RESULTS

Monad motifs in DNA sequences

Dyad motifs in DNA sequences

CONCLUSION