Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Volume 15, Issue 1
Displaying 1-24 of 24 articles from this issue
  • Sabine Becker-Weimann, Jana Wolf, Achim Kramer, Hanspeter Herzel
    2004 Volume 15 Issue 1 Pages 3-12
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Many cellular and physiological processes have been shown to display a rhythm of about 24 hours. This so-called circadian rhythm is based on a system of interlocked negative and positive molecular feedback loops. Here we extend a previous model of the circadian oscillator by including REV-ERBa as an additional component. This new model will allow us to investigate the function of an additional negative feedback loop via REV-ERBα. We obtain circadian oscillations with the correct period and phase relations between clock components. Parameter variations that correspond to clock-gene mutations reproduce experimental results: With parameter variations mimicking the Bmal1-/- and the Per2Brdm1 mutation the oscillations cease to exist. In contrast, the system shows sustained oscillations if we use a parameter set that reflects the Rev-erbα mutation. The model also accounts for the differential effect of the Cry1-/- and Cry2-/- mutations on the circadian period. The simulations of the extended model indicate that the original model is robust with respect to the incorporation of the additional component. Depending on the kinetics of the Per2/Cry transcriptional activation by BMAL1, an increasing BMAL1 expression leads to either an increase or decrease of the clock period. This indicates that overexpression experiments could help to characterize the impact of BMAL1 on Per2/Cry transcription.
    Download PDF (1097K)
  • Bernd Binder, Reinhart Heinrich
    2004 Volume 15 Issue 1 Pages 13-23
    Published: 2004
    Released on J-STAGE: November 16, 2011
    JOURNAL FREE ACCESS
    We present a theoretical approach for understanding the interrelations between dynamics and structure of signal transduction pathways. We consider large sets of networks with a specific number of kinases and phosphatases. Our methods are based on nonlinear differential equations and pathway dynamics is characterised in terms of signal amplification and signal duration. We show that networks with a high number of kinases, high connectivities and low phosphatase activities tend to be unstable and run, therefore, the risk to display autoactivation. Analysis of signal transduction pathways retrieved from databases reveals that several structural characteristics required for pathway stability are fulfilled for networks of very large size.
    Download PDF (1308K)
  • Sascha Bulik, Björn Peters, Christian Ebeling, Hergo Holzhüt ...
    2004 Volume 15 Issue 1 Pages 24-34
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The vertebrate immune system is able to detect abnormal body cells by the specific repertoire of 8-12 residues long peptides (= epitopes or peptide antigens) presented at the cell surface by the MHC-1 molecule complex. The generation of an epitope starts with the degradation of endogenous proteins into primary oligomeric fragments by cytosolic proteases, predominantly the proteasome. These primary fragments may be further attacked by various amino peptidases resident in the cytosol or, alternatively, may escape from this attack by entering the endoplasmic reticulum (ER) by the transporter associated with antigen presentation (TAP). To study the possible consequences of this scenario for the efficiency of antigen presentation we have applied kinetic modelling. The mathematical model comprises the generation of primary oligomeric fragments containing the definitive epitope, the successive N-terminal shortening of these primary fragments by cytosolic amino peptidases and the TAP-mediated transport of cytosolic peptides into the ER. Because the number of peptide molecules may become very small we have performed deterministic and stochastic simulations of the kinetic model. Our simulations show that cytosolic N-terminal trimming of primary fragments may drastically increase loading epitope precursors into the ER. In particular, a primary fragment generated with a low rate of TAP transport into the ER may nevertheless become a potent epitope precursor if at least one of its N-terminal trimming products will be efficiently transported.
    Download PDF (1286K)
  • Oliver Ebenhöh, Thomas Handorf, Reinhart Heinrich
    2004 Volume 15 Issue 1 Pages 35-45
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Methods are developed for structural analysis of metabolic networks expanding in size. Expansion proceeds in consecutive generations in which new reactions are attached to the network produced in the previous stage. Different rules are applied resulting in various modes of expansion. Expansion is performed on the set of glycolytic reactions as well as on a very large set of reactions taken from the KEGG database. It is shown that reactions and compounds strongly differ in thegeneration in which they are attached to the network allowing conclusions for the temporal order of the acquisition during network evolution. The expansion provides efficient tools for detecting new structural characteristics such as substrate-product relationships over long distances.
    Download PDF (1203K)
  • Erwin Frey, Andrea Parmeggiani, Thomas Franosch
    2004 Volume 15 Issue 1 Pages 46-55
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Intracellular transport and cytoskeletal organization are the result of an interaction between elastic filaments and force generation by motor proteins. The observed phenomena are still too complex for a complete theoretical description. Studies on simple model systems reveal interesting collective phenomena which can be understood on the basis of driven stochastic processes far from equilibrium.
    Download PDF (1259K)
  • Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu
    2004 Volume 15 Issue 1 Pages 56-68
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Various computational methods have been proposed for inference of protein-protein interactions since protein-protein interaction plays an essential role in many cellular processes. One of wellstudied approaches is to infer protein-protein interactions based on domain-domain interactions. To extend this approach, we proposed a method called LPNM to infer ratios of interactions, which outperformed other existing methods in terms of error of predicted ratios. However, since the LPNM method is based on the linear programming approach, it may require a large amount of time to infer interactions for a large data set.
    In this paper, we propose a simple method to infer the ratios of protein-protein interactions based on the association method by Sprinzak et al. In an experiment with a data set of proteinprotein interactions in yeast, it runs more than 150 times as fast as the LPNM method, and achieves almost the same accuracy.
    On implementing algorithms for the inference problem, it is essential to understand how difficult the problem is. Even though various methods for the problem have been already proposed, it has not been analyzed rigorously from a computational point of view. We hence define a problem to maximize correctly classified examples, and prove the problem is MAX SNP-hard, which also means the problem is NP-hard.
    Download PDF (1232K)
  • Yoshiyuki Hizukuri, Yoshihiro Yamanishi, Kosuke Hashimoto, Minoru Kane ...
    2004 Volume 15 Issue 1 Pages 69-81
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Glycans, which are carbohydrate sugar chains attached to some lipids or proteins, have a huge variety of structures and play a key role in cell communication, protein interaction and immunity. The availability of a number of glycan structures stored in the KEGG/GLYCAN database makes it possible for us to conduct a large-scale comparative research of glycans. In this paper, we present a novel approach to compare glycan structures and extract characteristic glycan substructures of certain organisms. In the algorithm we developed a new similarity measure of glycan structures taking into account of several biological aspects of glycan synthesis and glycosyltransferases, and we confirmed the validity of our similarity measure by conducting experiments on its ability to classify glycans between organisms in the framework of a support vector machine. Finally, our method successfully extracted a set of candidates of substructrues which are characteristic to human, rat, mouse, bovine, pig, chicken, yeast, wheat and sycamore, respectively. We confirmed that the characteristic substructures extracted by our method correspond to the substructures which are known as the species-specific sugar chain of γ-glutamyltranspeptidases in the kidney.
    Download PDF (1076K)
  • Daisuke Hoshiyama, Kei-ichi Kuma, Takashi Miyata
    2004 Volume 15 Issue 1 Pages 82-92
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    To reveal the relationship between organismal evolution and the molecular evolutionary rate, the temporal pattern of evolutionary rates were investigated for various genes during the course of deuterostome evolution. Deuterostome lineage leading to extant mammals was tentatively divided into two periods (the First and the Latter periods) by the time of divergence of bony fishes and mammals. For each of the First and the Latter period, evolutionary rates of 207 gene sets were calculated. In the Latter period, the evolutionary rate was significantly reduced in such informational genes as transcription factors and cytoplasmic ribosomal RNAs and proteins. In contrast, a variety of enzymes and mitochondrial ribosomal proteins evolve at nearly constant rate throughout the First and the Latter periods. The present result suggests that the increase of gene number by extensive gene duplications in the early evolution of vertebrates is responsible for the decrease of evolutionary rate.
    Download PDF (1332K)
  • Masumi Itoh, Tatsuya Akutsu, Minoru Kanehisa
    2004 Volume 15 Issue 1 Pages 93-104
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using pre-computed all-against-all similarity scores in a target database. We previously developed a method for derivation of an upper bound of the Smith-Waterman score (SW-score) between a query and a homolog candidate sequence using the SW-score between the candidate and a sequence similar to the query. In this paper, by using this upper bound, we first cluster the sequences in the target database so that upper bounds of SW-scores for all the members in the clusters are less than a given value and select representative sequences for respective clusters. Then, the query sequence is searched against the representative sequences and the upper bounds of SW-scores for respective clusters are estimated. Only if the upper bound is higher than a given threshold, SW-alignments are computed for all the sequences in the cluster. We performed computational experiments to test efficiency of the proposed method for the KEGG/GENES database using the KEGG/SSDB. The results suggest that our method is efficient for redundant databases that include multiple closely related species.
    Download PDF (1286K)
  • Euna Jeong, I-Fang Chung, Satoru Miyano
    2004 Volume 15 Issue 1 Pages 105-116
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Identification of the most putative RNA-interacting residues in protein is an important and challenging problem in a field of molecular recognition. Structural analysis of protein-RNA complexes reveals a strong correlation between interaction residues and their structure. Building on this viewpoint, we have developed a neural network predictor to correctly identify residues involved in protein-RNA interactions from protein sequence and its structural information. The system has been exhaustedly cross-validated with various strategies differing in input encoding, amount of input information, and network architectures. In addition, we have evaluated performance among functional subsets of complexes. Finally, to reflect the properties of protein-RNA complexes in our dataset, two kinds of post-processing method are adopted. The experimental result shows that our system yields not-trivial performance although the residues in interaction sites are too scarce.
    Download PDF (1338K)
  • Szymon M. Kielbasa, Nils Blüthgen, Christine Sers, Reinhold Sch&a ...
    2004 Volume 15 Issue 1 Pages 117-124
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We present a computational pipeline to predict cis-regulatory elements composing results based on different algorithms: Clover, Cluster-Buster, an own implementation of human/rat/mouse sequence identity and our ITB algorithm. The procedure uses information from the human genome sequence, NCBI gene annotations, verified eukaryotic promoters (EPD), experimentally proven binding sites (Transfac) and homologies to mouse and rat (HomGL/HomoloGene).
    We test the approach on 18 upstream regions of experimentally verified AP-1 target genes. About a half of the known sites belong to high-scoring candidates. Three top-scoring elements are confirmed by Cluster-Buster and high homologies.
    The same analysis we applied to genes found to be up- or downregulated due to mutated RAS. We performed a detailed literature and computational search for promoter regions. Indications of overrepresented Elk-1 and AP-1 motifs are found via a comparison with shuffled sequences. In some promoters consistent predictions of clustered binding sites were obtained.
    Download PDF (914K)
  • Edda Klipp, Wolfram Liebermeister, Christoph Wierling
    2004 Volume 15 Issue 1 Pages 125-137
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Functional properties of biochemical networks depend on both the network structure and the kinetic parameters. Extensive data on metabolic network topologies have been collected in databases, but much less information is available about the kinetic constants or metabolite concentrations. Depending on the values of these parameters, metabolic fluxes and control coefficients may vary within a wide range. Nevertheless, some of the parameters may have little influence on the observables of interest. We address the question whether, despite uncertainty about kinetic parameters, probabilistic statements can be made about dynamic network features. To this end, we perform a variability analysis of the parameters: assuming that the parameters follow statistical distributions, we compute the resulting distributions of the network properties like metabolic fluxes, concentrations, or control coefficients by Monte Carlo simulation. In this manner, we study systematically the possible distributions arising from typical topologies of biochemical networks such as linear chains, branched networks, and signaling and gene expression cascades. This analysis reveals how much information about dynamic behavior can be drawn from structural knowledge.
    Download PDF (1391K)
  • Roland Krüger, Reinhart Heinrich
    2004 Volume 15 Issue 1 Pages 138-148
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We present a framework for model reduction of signal transduction networks. The methods are explained by considering a recent model for Wnt/β-catenin signalling which plays an important regulatory role in cell development and oncogenesis. The procedure results in a reduction of system variables and parameters while maintaining the ability of the model to describe experimental data and to predict the in-vivo behaviour of the pathway. Using metabolic control analysis we quantified the response of the pathway towards random fluctuations of model parameters. This allows to characterise the robustness of the pathway against perturbations in stimulated and unstimulated states. We show that robustness depends on structural as well as kinetic properties of the pathway.
    Download PDF (1015K)
  • Joseph C. Mellor, Jie Wu, Charles De Lisi
    2004 Volume 15 Issue 1 Pages 149-159
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Problems of inference in systems biology are ideally reduced to formulations which can efficiently represent the features of interest. In the case of predicting gene regulation and pathway networks, an important feature which describes connected genes and proteins is the relationship between active and inactive forms, i.e. between the “on” and “off” states of the components. While not optimal at the limits of resolution, these logical relationships between discrete states can often yield good approximations of the behavior in larger complex systems, where exact representation of measurement relationships may be intractable. We explore techniques for extracting binary state variables from measurement of gene expression, and go on to describe robust measures for statistical significance and information that can be applied to many such types of data. We show how statistical strength and information are equivalent criteria in limiting cases, and demonstrate the application of these measures to simple systems of gene regulation.
    Download PDF (1137K)
  • Julian Mintseris, Zhiping Weng
    2004 Volume 15 Issue 1 Pages 160-169
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The problem of describing a protein representation by breaking up the amino acids atoms into functionally similar atom groups has been addressed by many researchers in the past 25 years. They have used a variety of physical, chemical and biological criteria of varying degrees of rigor to essentially impose our understanding of protein structures onto various atom-typing schemes used in studies of protein folding, protein-protein and protein-ligand interactions, and others. Here, instead, we have chosen to rely primarily on the data and use information-theoretic techniques to dissect it. We show that we can obtain an optimized protein representation for a given alphabet size from protein monomers or protein interface datasets that are in agreement with general concepts of protein energetics. Closer inspection of the atom partitions led to interesting observations pointing to the greater importance of the hydrophobic interactions in protein monomers compared to interfaces and, conversely, greater importance of polar/charged interaction in protein interfaces. Comparing the atom partitions from the two datasets we show that the two are strikingly similar at alphabet size of five, proving that despite some differences, the general energetic concepts are very similar for folding and binding. Implications for further structural studies are discussed.
    Download PDF (1013K)
  • Hiroo Murakami, Nobuyoshi Sugaya, Makihiko Sato, Akira Imaizumi, Sachi ...
    2004 Volume 15 Issue 1 Pages 170-179
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named “STEPSTONE”, to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.
    Download PDF (1044K)
  • Masao Nagasaki, Atsushi Doi, Hiroshi Matsuno, Satoru Miyano
    2004 Volume 15 Issue 1 Pages 180-197
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The research on modeling and simulation of complex biological systems is getting more important in Systems Biology. In this respect, we have developed Hybrid Function Petri net (HFPN) that was newly developed from existing Petri net because of their intuitive graphical representation and their capabilities for mathematical analyses. However, in the process of modeling metabolic, gene regulatory or signal transduction pathways with the architecture, we have realized three extensions of HFPN, (i) an entity should be extended to contain more than one value, (ii) an entity should be extended to handle other primitive types, e. g. boolean, string, (iii) an entity should be extended to handle more advanced type called object that consists of variables and methods, are necessary for modeling biological systems with Petri net based architecture. To deal with it, we define a new enhanced Petri net called hybrid functional Petri net with extension (HFPNe). To demonstrate the effectiveness of the enhancements, we model and simulate with HFPNe four biological processes that are difficult to represent with the previous architecture HFPN.
    Download PDF (2372K)
  • Henning Riedesel, Björn Kolbeck, Oliver Schmetzer, Ernst-Walter K ...
    2004 Volume 15 Issue 1 Pages 198-212
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We explore two different methods to predict the binding ability of nonapeptides at the class I major histocompatibility complex using a general linear scoring function that defines a separating hyperplane in the feature space of sequences. In absence of suitable data on non-binding nonapeptides we generated sequences randomly from a selected set of proteins from the protein data bank. The parameters of the scoring function were determined by a generalized least square optimization (LSM) and alternatively by the support vector machine (SVM). With the generalized LSM impaired data for learning with a small set of binding peptides and a large set of non-binding peptides can be treated in a balanced way rendering LSM more successful than SVM, while for symmetric data sets SVM has a slight advantage compared to LSM.
    Download PDF (1822K)
  • Boris E. Shakhnovich, John Max Harvey, Charles De Lisi
    2004 Volume 15 Issue 1 Pages 213-220
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    ELISA (http://romi.bu.edu/elisa/) is a database that was designed for flexibility in defining interesting queries about protein domain evolution. We have defined and included both the inherent characteristics of the domains such as structure and function and comparisons of these characteristics between domains. Thus, the database is useful in defining structural and functional links between related protein domains and by extension sequences that encode them. In this database we introduce and employ a novel method of functional annotation and comparison. For each protein domain we create a probabilistic functional annotation tree using GO. We have designed an algorithm that accurately compares these trees and thus provides a measure of “functional distance” between two protein domains. Along with functional annotation, we have also included structural comparison between protein domains and best sequence comparisons to all known genomes. The latter enables researchers to dynamically do searches for domains sharing similar phylogenetic profiles. This combination of data and tools enables the researcher to design complex queries to carry out research in the areas of protein domain evolution, structure prediction and functional annotation of novel sequences.
    Download PDF (1256K)
  • Identification of Co-Expressed Genes through Module Gene Flow
    Boris E Shakhnovich, Timothy E Reddy, Kevin Galinsky, Joseph Mellor, C ...
    2004 Volume 15 Issue 1 Pages 221-228
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    A question of fundamental importance is the definition and identification of modules from microarray experiments. A wide variety of techniques have been used to gain insight into the elucidation of such modules. One problem, however, is the inability to directly compare results between the different data sets produced due to the inherent parameterizations of their approaches. We first aim to provide a mechanism by which different approaches to module finding can be directly compared. Moreover, the same approach can be used to internally compare the modules predicted by the same technique, but at different parameterizations. We apply this approach to analyze the flow of genes through modules at different module thresholds of the Barkai Signature method, thereby further resolving the modules into sets of co-expressed genes.
    Download PDF (859K)
  • Nobuyoshi Sugaya, Makihiko Sato, Hiroo Murakami, Akira Imaizumi, Sachi ...
    2004 Volume 15 Issue 1 Pages 229-238
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Three possible causes responsible for the large genome size of a cyanobacterium Anabaena sp. PCC7120 are investigated: 1) sequential tandem duplications of gene segments, genes or genomic segments, 2) horizontal gene transfers from other organisms, and 3) whole-genome duplication. We evaluated the frequency distribution of angles between paralog locations for the possibility 1), the fraction of genes deviated in GC content, GC skew, AT skew and codon adaptation index for the 2) and the gene-configuration comparison of paralogs for the 3). As a result, the possibility 3), the whole-genome duplication, was more reasonable as a molecular cause than the other causes for the large genome size in Anabaena sp. PCC7120. In addition, the whole-genome duplication was supported by the analysis of distribution pattern of protein genes with respect to functional categories.
    Download PDF (1000K)
  • Chang-Jiun Wu, Yutao Fu, T. M. Murali, Simon Kasif
    2004 Volume 15 Issue 1 Pages 239-248
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in various tissue or cell types across a range of experimental conditions. These studies can lead to the identification of diagnostic genes, classification of genes into functional categories, association of genes with regulatory pathways, and clustering of genes into modules that are potentially coregulated by a group of transcription factors. Traditional clustering methods such as hierarchical clustering or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Bi-clustering of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. This methodology can be a first step towards the discovery of co-regulated and co-expressed genes or modules. Although bi-clustering (also called block clustering) was introduced in statistics in 1974 few robust and efficient solutions exist for extracting gene expression modules in microarray data. In this paper, we propose a simple but promising new approach for bi-clustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS has been tested on synthetic data generated to evaluate the effect of noise on the performance of the algorithm as well as on published leukemia datasets. In our preliminary studies comparing GEMS with other biclustering software we show that GEMS is a reliable, flexible and computationally efficient approach for bi-clustering gene expression data.
    Download PDF (1305K)
  • Takuji Yamada, Susumu Goto, Minoru Kanehisa
    2004 Volume 15 Issue 1 Pages 249-258
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In the post-genomic era, it is important to analyze interaction networks that include genes, proteins, enzymes and compounds such as a metabolic pathway. Every organism has such networks individually. However. several Darts of them are conserved in different organisms. The purpose of this analysis is to extract sub-networks composed of these common elements through the phylogenetic analysis. We extracted network modules from metabolic pathways using phylogenetic profile and cluster analysis. The enzymes of these modules are related by evolutionary and functional correlation. Our results give a valuable insight into the evolution of metabolic pathways.
    Download PDF (1084K)
  • Hiroshi Mamitsuka, Temple F. Smith
    2004 Volume 15 Issue 1 Pages v
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (108K)
feedback
Top