Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Approximate Multiple String Searching by Clustering
Fei ShiPeter Widmayer
Author information
JOURNAL FREE ACCESS

1996 Volume 7 Pages 33-40

Details
Abstract

We are given a finite set S of text strings and a pattern P over some fixed alphabet Σ. The topic of this paper is the design of a data structure D (S) which supports approximate multiple string searching queries efficiently. Thereby, for a given upper bound k ∈ Z+ on the allowable distance, P=p1...pm is said to appear approximately in a text T=t1...tn, m, n ∈ Z+, if there exist positions u, v in T such that the edit distance between P and tu...tv is at most k. Let N denote the sum of the lengths of all strings in S. We present an algorithm that constructs the data structure D (S) in O (N) time and space. Afterwards, an approximate multiple string search query can be answered in 0 (N) expected-time if the allowable distance k is bounded above by 0 (m/log m). The method can be used to search large nucleotide and amino acid sequence databases for similar sequences.

Content from these authors
© Japanese Society for Bioinformatics
Previous article Next article
feedback
Top