IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Volume 2
Displaying 1-10 of 10 articles from this issue
  • Tatsuya Akutsu
    Article type: Editorial
    Subject area: Editorial
    2009Volume 2 Pages 1
    Published: 2009
    Released on J-STAGE: March 24, 2009
    JOURNAL FREE ACCESS
    Download PDF (31K)
  • Kazuhiro Maeda, Hiroyuki Kurata
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 2-14
    Published: 2009
    Released on J-STAGE: March 24, 2009
    JOURNAL FREE ACCESS
    Dynamic simulations are essential for understanding the mechanism of how biochemical networks generate robust properties to environmental stresses or genetic changes. However, typical dynamic modeling and analysis yield only local properties regarding a particular choice of plausible values of kinetic parameters, because it is hard to measure the exact values in vivo. Global and firm analyses are needed that consider how the changes in parameter values affect the results. A typical solution is to systematically analyze the dynamic behaviors in large parameter space by searching all plausible parameter values without any biases. However, a random search needs an enormous number of trials to obtain such parameter values. Ordinary evolutionary searches swiftly obtain plausible parameters but the searches are biased. To overcome these problems, we propose the two-phase search method that consists of a random search and an evolutionary search to effectively explore all possible solution vectors of kinetic parameters satisfying the target dynamics. We demonstrate that the proposed method enables a nonbiased and high-speed parameter search for dynamic models of biochemical networks through its applications to several benchmark functions and to the E. coli heat shock response model.
    Download PDF (2323K)
  • Yukako Tohsato, Yu Nishimura
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 15-24
    Published: 2009
    Released on J-STAGE: March 24, 2009
    JOURNAL FREE ACCESS
    Comparative analyses of enzymatic reactions provide important information on both evolution and potential pharmacological targets. Previously, we focused on the structural formulae of compounds, and proposed a method to calculate enzymatic similarities based on these formulae. However, with the proposed method it is difficult to measure the reaction similarity when the formulae of the compounds constituting each reaction are completely different. The present study was performed to extract substructures that change within chemical compounds using the RPAIR data in KEGG. Two approaches were applied to measure the similarity between the extracted substructures: a fingerprint-based approach using the MACCS key and the Tanimoto/Jaccard coefficients; and the Topological Fragment Spectra-based approach that does not require any predefined list of substructures. Whether the similarity measures can detect similarity between enzymatic reactions was evaluated. Using one of the similarity measures, metabolic pathways in Escherichia coli were aligned to confirm the effectiveness of the method.
    Download PDF (587K)
  • Kenta Sasaki, Nobuyoshi Nagamine, Yasubumi Sakakibara
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 25-35
    Published: 2009
    Released on J-STAGE: March 24, 2009
    JOURNAL FREE ACCESS
    Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called “the third chain of the living organism”. About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.
    Download PDF (973K)
  • Michiaki Hamada, Toutai Mituyama, Kiyoshi Asai
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 36-46
    Published: 2009
    Released on J-STAGE: March 24, 2009
    JOURNAL FREE ACCESS
    Recently, a large number of candidates of non-coding RNAs (ncRNAs) has been predicted by experimental or computational approaches. Moreover, in genomic sequences, there are still many interesting regions whose functions are unknown (e.g., indel conserved regions, human accelerated regions, ultraconserved elements and transposon free regions) and some of those regions may be ncRNAs. On the other hand, it is known that many ncRNAs have characteristic secondary structures which are strongly related to their functions. Therefore, detecting clusters which have mutually similar secondary structures is important for revealing new ncRNA families. In this paper, we describe a novel method, called RNAclique, which is able to search for clusters containing mutually similar and locally stable secondary structures among a large number of unaligned RNA sequences. Our problem is formulated as a constraint quasi-clique search problem, and we use an approximate combinatorial optimization method, called GRASP, for solving the problem. Several computational experiments show that our method is useful and scalable for detecting ncRNA families from large sequences. We also present two examples of large scale sequence analysis using RNAclique.
    Download PDF (405K)
  • Naoto Yukinawa, Taku Yoshioka, Kazuo Kobayashi, Naotake Ogasawara, Shi ...
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 47-62
    Published: 2009
    Released on J-STAGE: May 25, 2009
    JOURNAL FREE ACCESS
    Clustering is a practical data analysis step in gene expression-based studies. Model-based clusterings, which are based on probabilistic generative models, have two advantages: the number of clusters can be determined based on statistical criteria, and the clusters are robust against the observation noises in data. Many existing approaches assume multi-variate Gaussian mixtures as generative models, which are analogous to the use of Euclidean or Mahalanobis type distance as the similarity measure. However, these types of similarity measures often fail to detect co-expressed gene groups. We propose a novel probabilistic model for cluster analyses based on the correlation between gene expression patterns. We also propose a “meta” cluster analysis method to eliminate the dependence of the clustering result on initial values of the clustering algorithm. In empirical studies with a time course gene expression dataset of Bacillus subtilis during sporulation, our method acquires more stable and informative results than the ordinary Gaussian mixture model-based clustering, k-means clustering and hierarchical clustering algorithms, which are widely used in this field. In addition, with the meta-cluster analysis, biologically-meaningful expression patterns are extracted from a set of clustering results. The constraints in our model worked more efficiently than those in the previous studies. In our experiment, such constraints contributed to the stability of the clustering results. Moreover, the clustering based on the Bayesian inference was found to be more stable than those by the conventional maximum likelihood estimation.
    Download PDF (662K)
  • Ai Mikami, Jianming Shi
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 63-73
    Published: 2009
    Released on J-STAGE: May 25, 2009
    JOURNAL FREE ACCESS
    In this study, we use the Ant Colony System (ACS) to develop a heuristic algorithm for sequence alignment. This algorithm is certainly an improvement on ACS-MultiAlignment, which was proposed in 2005 for predicting major histocompatibility complex (MHC) class II binders. The numerical experiments indicate that this algorithm is as much as 2, 900 times faster than the original ACS-MultiAlignment algorithm. We also compare this algorithm to the other approaches such as Gibbs sampling algorithm using numerical experiments. The results show that our algorithm finds the best value prompter than Gibbs approach.
    Download PDF (599K)
  • Daigo Wakatsu, Takeo Okazaki
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 74-82
    Published: 2009
    Released on J-STAGE: June 22, 2009
    JOURNAL FREE ACCESS
    Iterative refinement algorithm is a useful method to improve the alignment results. In this paper, we evaluated different iterative refinement algorithms statistically. There are four iterative refinement algorithms: remove first (RF), bestfirst (BF), random (RD), and tree-based (Tb) iterative refinement algorithm. And there are two scoring functions for measuring the iteration judgment step: log expectation (LE) and weighted sum-of-pairs (SP) scores. There are two sequence clustering methods: neighbor-joining (NJ) method and unweighted pair-group method with arithmetic mean (UPGMA). We performed comprehensive analyses of these alignment strategies and compared these strategies using BAliBASE SP (BSP) score. We observed the behavior of scores from the view point of cumulative frequency (CF) and other basic statistical parameters. Ultimately, we tested the statistical significance of all alignment results by using Friedman nonparametric analysis of variance (ANOVA) test for ranks and Scheffé multiple comparison test.
    Download PDF (1275K)
  • Yuta Ashida, Tomonobu Ozaki, Takenao Ohkawa
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 83-92
    Published: 2009
    Released on J-STAGE: September 24, 2009
    JOURNAL FREE ACCESS
    A comparative analysis of organisms with metabolic pathways provides important information about functions within organisms. In this paper, we discuss problem of comparing organisms using partial metabolic structures that contain many biological characteristics and propose a pathway comparison method based on an elementary flux mode (EFM) — the minimal metabolic pathway that satisfies a steady state. By the extraction of the ‘elementary flux mode, ’ we obtain biologically significant metabolic substructures. To compare metabolic pathways based on EFMs, we propose a new pseudo alignment method with a penalty based on the importance of enzymes. The distance among organisms can be calculated based on the pseudo alignment of EFMs. To confirm its effectivity, we apply the proposed method to the pathway datasets from 38 organisms. We successfully reconstructed a “three domain theory” from the aspect of the biological function. Moreover, we evaluated the results in terms of the accuracy of organism classification from the biological function and confirmed that the obtained classification was related deeply to such habitats as aerobe or anaerobe.
    Download PDF (804K)
  • Kazunori Miyanishi, Tomonobu Ozaki, Takenao Ohkawa
    Article type: Original Papers
    Subject area: Original Paper
    2009Volume 2 Pages 93-100
    Published: 2009
    Released on J-STAGE: September 24, 2009
    JOURNAL FREE ACCESS
    As the number of documents about protein structural analysis increases, a method of automatically identifying protein names in them is required. However, the accuracy of identification is not high if the training data set is not large enough. We consider a method to extend a training data set based on machine learning using an available corpus. Such a corpus usually consists of documents about a certain kind of organism species, and documents about different kinds of organism species tend to have different vocabularies. Therefore, depending on the target document or corpus, it is not effective for the accurate identification to simply use a corpus as a training data set. In order to improve the accuracy, we propose a method to select sentences that have a positive effect on identification and to extend the training data set with the selected sentences. In the proposed method, a portion of a set of tagged sentences is used as a validation set. The process to select sentences is iterated using the result of the identification of protein names in a validation set as feedback. In the experiment, compared with the baseline, a method without a corpus, with a whole corpus, or with a part of a corpus chosen at random, the accuracy of the proposed method was higher than any baseline method. Thus, it was confirmed that the proposed method selected effective sentences.
    Download PDF (736K)
feedback
Top