Abstract
This paper introduces a method to detect tree patterns (tree motifs) in a database of rooted unordered labeled trees. The method can be viewed as an extension of the Gibbs sampling approach to detect sequence motifs. Basically, we enumerate tree topologies and for each topology we seek within the database for tree motifs with the given topology. A tree motif can be detected by matching the tree topology against the database of trees and then applying Gibbs sampling on the matching set. After completion of the process for a given tree topology, the process is restarted for the next enumerated tree topology. The method outputs for each topology the best tree motif found. We applied our method to an artificially created database of trees as well as to a database of carbohydrate (glycan) structures.