In this paper, we propose a symbolic representation for multi-dimensional trajectory data. The central step in the proposal is discretizing the raw time-series of two dimensional trajectory into a corresponding sequence of symbols. Experimental results in both synthetic and real time series data sets show the time-efficiency of our proposal and our representation is also often less sensitive to noise.
Time series medical data contains many null values and is collected over a long period of time. The focus is on extracting longer decreasing / increasing patterns/biclusters that may be of interest to medical experts in analysing drug responses and therapies, as well as predicting certain disease occurences. We apply the technique of biclustering to extract new, interesting patterns from this data. Given the data for each patient, we discretize it to obtain a symbolic representation using statistical methods. We then proceed to efficiently construct a compact generalized suffix tree over the entire dataset. The algorithm presented in this work extends the problem of common motif searching as applied in microarray experiments to extract approximate biclusters from within the suffix tree utilizing a form of string edit distance restricted to substitution and deletion, and the concept of valid models.
Traditional relation extraction requires pre-defined relations and many human annotated training data. Meanwhile, open relation extraction demands a set of heuristic rules to extract all potential relations from text. These requirements reduce the practicability and robustness of information extraction system. In this paper, we propose a bootstrapping framework, which uses a few seed sentences marked up with two entities to expand a ranked list of sentences containing target relations. During the expansion process, label propagation algorithm is used to select the most confident entity pairs and context patterns. In order to rank these extracted sentences according their relevance to the given seeds, we propose Multi-View Ranking algorithm. The algorithm is a semi-supervised multi-view learning algorithm which combine information from both entity pair view and context pattern view to rank the sentences.