Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Volume 16, Issue 1
Displaying 1-31 of 31 articles from this issue
  • Suyog Rao, Alfredo Rodriguez, Gary Benson
    2005 Volume 16 Issue 1 Pages 3-12
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Tandem repeats are an important class of DNA repeats and much research has focused on their efficient identification [2, 4, 5, 11, 12], their use in DNA typing and fingerprinting [6, 16, 18], and their causative role in trinucleotide repeat diseases such as Huntington Disease, myotonic dystrophy, and Fragile-X mental retardation. We are interested in clustering tandem repeats into groups or families based on sequence similarity so that their biological importance may be further explored. To cluster tandem repeats we need a notion of pairwise distance which we obtain by alignment. In this paper we evaluate five distance functions used to produce those alignments-Consensus, Euclidean, Jensen-Shannon Divergence, Entropy-Surface, and Entropy-weighted. It is important to analyze and compare these functions because the choice of distance metric forms the core of any clustering algorithm. We employ a novel method to compare alignments and thereby compare the distance functions themselves. We rank the distance functions based on the cluster validation techniques-Average Cluster Density and Average Silhouette Width. Finally, we propose a multi-phase clustering method which produces good-quality clusters. In this study, we analyze clusters of tandem repeats from five sequences: Human Chromosomes 3, 5, 10 and X and C. elegans Chromosome III.
    Download PDF (4229K)
  • Hiroo Murakami, Sachiyo Aburatani, Katsuhisa Horimoto
    2005 Volume 16 Issue 1 Pages 13-21
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Various types of repeat sequences are abundant in genomic sequences, and they are associated with the biological phenomena at distinct levels. In particular, comparative analyses of wholegenome-sized sequence data have revealed that repeat sequences cause segmental duplications, which are a type of chromosomal structural arrangement. In this study, we analyzed the relationships between segmental duplications and repeat sequences in human chromosome 7. For this purpose, three methods for detecting repeat sequences were applied to the genomic sequences of human chromosome 7: RepeatMasker for the dispersed repeats, TRF for the tandem repeats, and STEPSTONE for the inter-spread repeats. By plotting the detected repeat sequences against the locations on the chromosome, all three types of repeats were found to be concentrated around the regions of segmental duplications, as a macroscopic feature of their distributions. Furthermore, the latter two repeat sequences were classified in terms of their periods, and the distribution bias of the detected repeat sequences was statistically tested between the segmental duplication regions and the other regions. As a result, the periods of two repeats were biased, with less than a 5% level of significance probability by the X2 test, and the repeats with long periods, about 130bp and more than 400bp, were attributed to a bias with a 5% level of significance probability by the normalized residual test. The mechanism of segmental duplications is discussed based on the present results.
    Download PDF (4235K)
  • Kazutaka Katoh, Kei-ichi Kuma, Takashi Miyata, Hiroyuki Toh
    2005 Volume 16 Issue 1 Pages 22-33
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to-5, 000 sequences) and long data (-2, 000 as or -5, 000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) of MAFFT were outperformed by ProbCons and TCoffee v.2, both of which were released in 2004, in several benchmark tests. Here we report a recent extension of MAFFT that aims to improve the accuracy with as little cost of calculation time as possible. The extended version of MAFFT (v.5) has new iterative refinement options, G-INS-i and L-INS-i (collectively denoted as [GL]-INS-i in this report). These options use a new objective function combining the weighted sum-of-pairs (WSP) score and a score similar to COFFEE derived from all pairwise alignments. We discuss the improvement in accuracy brought by this extension, mainly using two benchmark tests released very recently, BAliBASE v.3 (for protein alignments) and BR, AliBASE (for RNA alignments). According to BAliBASE v.3, the overall average accuracy of L-INS-i was higher than those of other methods successively released in 2004, although the difference among the most accurate methods (ProbCons, TCoffee v.2 and new options of MAFFT) was small. The advantage in accuracy of [GL]-INS-i became greater for the alignments consisting of -50-100 sequences. By utilizing this feature of MAFFT, we also examined another possible approach to improve the accuracy by incorporating homolog information collected from database. The [GL]-INS-i options are applicable to aligning up to -200 sequences, although not applicable to thousands of sequences because of time and space complexities.
    Download PDF (3341K)
  • Lidio Marx Carvalho Meireles, Tatsuya Akutsu
    2005 Volume 16 Issue 1 Pages 34-43
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    This paper introduces a method to detect tree patterns (tree motifs) in a database of rooted unordered labeled trees. The method can be viewed as an extension of the Gibbs sampling approach to detect sequence motifs. Basically, we enumerate tree topologies and for each topology we seek within the database for tree motifs with the given topology. A tree motif can be detected by matching the tree topology against the database of trees and then applying Gibbs sampling on the matching set. After completion of the process for a given tree topology, the process is restarted for the next enumerated tree topology. The method outputs for each topology the best tree motif found. We applied our method to an artificially created database of trees as well as to a database of carbohydrate (glycan) structures.
    Download PDF (1029K)
  • Uwe Vinkemeier, Thomas Meyer
    2005 Volume 16 Issue 1 Pages 44-48
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Interferon stimulation of cells can activate several hundred target genes, many of which are required for antiviral protection. Promoter binding of tyrosine-phosphorylated (activated) Stat 1 dimers is essentiell for gene induction, a process that often entails the oligomerization of Stat 1 dimers via interactions of their aminoterminal domains. The mutation of a single residue (F 77) in the N-domain of Stat 1 was recently demonstrated to preclude both the dephosphorylation and the oligomerization of Stat 1 dimers. Here, we investigated the influence of defective oligomerization on a complex phenotype such as the induction of an antiviral state. It was found that the antiviral protection conferred by interferon-a was strongly reduced, whereas the interferon-g response was not measurably affected. These results indicate that Stat 1 oligomerization is required for the antiviral activity of interferons. Moreover, the concentration of activated Stat 1 in the nucleus may generally play a critical role for interferon-induced target gene activation.
    Download PDF (1845K)
  • Heather E. Burden, Zhiping Weng
    2005 Volume 16 Issue 1 Pages 49-58
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Many locations within transcription factor binding sites are not sequentially conserved and appear to be degenerate. We hypothesize that some of these positions contain essential structural codes that are recognized by the transcription factors that bind to them. The structural codes can be defined by base-pair step parameters that describe the relative displacement and orientation of two adjacent base pairs in a nucleic acid structure. We have developed a method, Identification of Conserved Structural Features (ICSF), which uses base-pair step parameters obtained from a collection of high-resolution DNA crystal structures to discover structural conservation that exists in the sequentially degenerate areas within a binding site and produce profiles of the structural features along the entire site. We have focused our study on the transcription factor binding sites in the JASPAR database and have found that one-third (P-value>0.05) of the binding sites contain sequentially degenerate locations with highly conserved structural features as described by the base-pair step parameters. These results will help us to gain a better understanding of the process by which transcription factors recognize their binding sites and possibly lead to an improvement in our ability to find these sites in genomic sequences.
    Availability: ICSF is freely available to academic users at http://zlab.bu.edu/ICSF
    Contact: zhiping@bu.edu
    Supplementaryinformation: http://zlab.bu.edu/ICSF
    Download PDF (2572K)
  • Timothy E. Reddy, Charles De Lisi, Boris E. Shakhnovich
    2005 Volume 16 Issue 1 Pages 59-67
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods.
    Computational identification of TFBSs is complicated in at least two different ways. First, transcription factors bind a heterogeneous distribution of sites and therefore have a distribution of affinities. Second, the set of sequences for which a common site is to be determined do not all have a site for the TF of interest. In this paper, we evaluate the robustness of TFBS identification with respect to both effects. We show addition of upstream regions that do not have the TFBS destroy the specificity of the predicted binding site. We also propose a method to calculate the distance between position weight matrices that can be used to measure “drift” from the canonical binding site. The results presented here could be useful in developing future transcription factor binding site identification algorithms.
    Download PDF (2410K)
  • Yutao Fu, Zhiping Weng
    2005 Volume 16 Issue 1 Pages 68-72
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    This paper describes a novel approach to constructing Position-Specific Weight Matrices (PWMs) based on the transcription factor binding site (TFBS) data provide by the TRANSFAC database and comparison of the newly generated PWMs with the original TRANSFAC matrices. Multiple local sequence alignment was performed on the TFBSs of each transcription factor. Several different alignment programs were tested and their matrices were compared to the original TRANSFAC matrices. One of the alignment programs, GLAM, produced comparable matrices in terms of the average ranking of true positive sites across the whole test set of sequences.
    Download PDF (2126K)
  • Thomas Höfer, Malte J. Rasch
    2005 Volume 16 Issue 1 Pages 73-82
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We analyse a stochastic model of transcription that describes transcription initiation by promoter activation and subsequent polymerase recruitment. Explicit expressions are derived for the control of an activator on the mean mRNA number and for the mRNA noise. Both properties are strongly influenced by the kinetics of promoter activation, mRNA synthesis and degradation. Low transcriptional noise is obtained either when the transcription initiation complex has a long life-time or when its components associate and dissociate rapidly. However, the ability of an activator to regulate the rnRNA level is low in the first and high in the second case. Large noise is generated when the initial activation step of the promoter is slow. In this case, transcription can be burst-like; the mRNA distribution becomes bimodal while regulability of the mean copy number is maintained.
    Download PDF (4296K)
  • Dustin T. Holloway, Mark Kon, Charles De Lisi
    2005 Volume 16 Issue 1 Pages 83-94
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Transcription factor binding sites (TFBS) in gene promoter regions are often predicted by using position specific scoring matrices (PSSMs), which summarize sequence patterns of experimentally determined TF binding sites. Although PSSMs are more reliable than simple consensus string matching in predicting a true binding site, they generally result in high numbers of false positive hits. This study attempts to reduce the number of false positive matches and generate new predictions by integrating various types of genornic data by two methods: a Bayesian allocation procedure, and support vector machine classification.
    Several methods will be explored to strengthen the prediction of a true TFBS in the Saccharomyces cerevisiae genome: binding site degeneracy, binding site conservation, phylogenetic profiling, TF binding site clustering, gene expression profiles, GO functional annotation, and k-mer counts in promoter regions. Binding site degeneracy (or redundancy) refers to the number of times a particular transcription factor's binding motif is discovered in the upstream region of a gene. Phylogenetic conservation takes into account the number of orthologous upstream regions in other genomes that contain a particular binding site. Phylogenetic profiling refers to the presence or absence of a gene across a large set of genomes. Binding site clusters are statistically significant clusters of TF binding sites detected by the algorithm ClusterBuster. Gene expression takes into account the idea that when the gene expression profiles of a transcription factor and a potential target gene are correlated, then it is more likely that the gene is a genuine target. Also, genes with highly correlated expression profiles are often regulated by the same TF (s). The GO annotation data takes advantage of the idea that common transcription targets often have related function. Finally, the distribution of the counts of all k-mers of length 4, 5, and 6 in gene's promoter region were examined as means to predict TF binding. In each case the data are compared to known true positives taken from ChIP-chip data [11, 14], Transfac, and the Saccharomyces Genome Database.
    First, degeneracy, conservation, expression, and binding site clusters were examined independently and in combination via Bayesian allocation. Then, binding sites were predicted with a support vector machine (SVM) using all methods alone and in combination. The SVM works best when all genornic data are combined, but can also identify which methods contribute the most to accurate classification. On average, a support vector machine can classify binding sites with high sensitivity and an accuracy of almost 80%.
    Download PDF (6868K)
  • Sachiyo Aburatani, Katsuhisa Horimoto
    2005 Volume 16 Issue 1 Pages 95-105
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Monitoring the expression of many genes under different conditions is a common approach for investigating gene relationships. In particular, the monitoring sheds light on the biological phenomena in which many genes are coordinately expressed. In this study, we analyzed the expression profiles of LexA-regulated genes after UV irradiation, to elucidate the genes related to the SOS response, which involves coordinately regulated gene expression. By the two-gene relationship analysis, the LexA-regulated genes were highly correlated with the genes involved in the DNA repair functions. The LexA-regulated genes with highly significant probability were divided into two groups: the LexA-regulated genes that were mutually related within them were related to the genes with DNA repair functions, while the LexA-regulated genes that were less related within them showed lower relation to the genes with DNA repair functions. By a multiple gene relationship analysis, the two types of LexA-regulated genes were clearly clustered, and the inferred network between the clusters indicated their sequential relationship of clusters in the two groups of LexA-regulated genes in the SOS response; the former type of genes emerged in the early stage of the SOS response upon the signal transduction by membrane proteins, cessation of cell division and recognition of DNA damage, and the latter type emerged in a later stage, and functioned in the repair mechanism and the resumption of DNA replication.
    Download PDF (1202K)
  • Nils Blüthgen, Karsten Brand, Branka Cajavec, Maciej Swat, Hanspe ...
    2005 Volume 16 Issue 1 Pages 106-115
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Increasingly used high throughput experimental techniques, like DNA or protein rnicroarrays give as a result groups of interesting, e.g. differentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical significance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are significantly enriched within a group of interesting genes when compared to a reference group. First we define an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at http://gossip.gene-groups.net/.
    Download PDF (1199K)
  • Bacillus subtilis and Escherichia coli
    Shujiro Okuda, Shuichi Kawashima, Susumu Goto, Minoru Kanehisa
    2005 Volume 16 Issue 1 Pages 116-124
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We measured conservation of gene co-regulation between two distantly related prokaryotes, B. subtilis and E. coli. The co-regulation between genes was extracted from knowledge of regulation of genes stored in databases. For B. subtilis operons, we obtained the data set from ODB which we have developed and, for the regulons, we used DBTBS. For E. coli data set, we used known regulons derived from RegulonDB. We obtained a reliable data set of co-regulated genes in B. subtilis and E. coli. About 60-80% of gene pairs conserved co-regulation relationships, so coregulation between genes are highly conserved even between distantly related species. To measure the functional relationship between these conserved genes, we used KEGG PATHWAY and COG. When two co-regulated genes are in the same biological pathway in KEGG or share the same functional category in COG, we assume that they have the same function. As a result, we also found that many conserved co-regulated gene pairs share the same functions. These observations would help to predict gene co-regulation and protein functions.
    Download PDF (897K)
  • Jerzy Dyczkowski, Martin Vingron
    2005 Volume 16 Issue 1 Pages 125-131
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We compared microarray experiments on cell cycle of three model eukaryotes: budding and fission yeast and human cells. Only 112 orthologous groups were cyclic in the three model organisms. The common set of cyclic orthologs includes many taking part in the cell cycle progression, like cyclin B homologs, CDC5, SCH9, DSK2, ZPR1. Proteins involved in DNA replication included histones, some checkpoint kinases and some proteins regulating DNA damage and repair. Conserved cyclic proteins involved in cytokinesis included myosins and kinesins. Many groups of genes related to translation and other metabolic processes were also cyclic in all three organisms. This reflects rebuilding of cellular components after the replication and changes of metabolism during the cell cycle. Many genes important in cell cycle control are not cyclic or not conserved. This includes transcription factors implicated in the regulation of budding yeast cell cycle. The partially overlapping roles of regulatory proteins might allow the evolutionary substitution of components of cell cycle.
    Download PDF (823K)
  • Harry Amri Moesa, Bahadur K.C. Dukka, Tatsuya Akutsu
    2005 Volume 16 Issue 1 Pages 132-141
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The existing methods for clustering of gene expression profile data either require manual inspection and other biological knowledge or require some cut-off value which can not be directly calculated from the given data set. Thus, the problem of systematic and efficient determination of cluster boundaries of clusters in gene expression profile data still remains demanding.
    In this context, we have developed a procedure for automatic and systematic determination of the boundaries of clusters in the hierarchical clustering of gene expression data based on the ratio of with-in class variance and between-class variance, which can be fully calculated from the given expression data. After the determination of dendrogram based on agglomerative hierarchical clustering, this ratio is used to determine the cluster boundary. Except this ratio which can be completely calculated from the given expression profile data, unlike other existing approaches, our approach does not require any manual inspection or biological knowledge. Our results are favorably comparable and in some of cases better than existing method which does not utilize prior information or manual inspection. Moreover, gene expression profile data are often contaminated with various type of noise and in order to reduce this noise content, we have also applied image enhancing technique called discrete wavelet transform. We tested a number of mother wavelet functions to smooth the noise in the gene expression data set and obtained some improvements in the quality of the results.
    Download PDF (1040K)
  • Jie Wu, Joseph C. Mellor, Charles De Lisi
    2005 Volume 16 Issue 1 Pages 142-149
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Phylogenetic profiling is now an effective computational method to detect functional associations between proteins. The method links two proteins in accordance with the similarity of their phyletic distributions across a set of genomes. While pair-wise linkage is useful, it misses correlations in higher order groups: triplets, quadruplets, and so on. Here we assess the probability of observing cooccurrence patterns of 3 binary profiles by chance and show that this probability is asymptotically the same as the mutual information in three profiles. We demonstrate the utility of the probability and the mutual information metrics in detecting overly represented triplets of orthologous proteins which could not be detected using pairwise profiles. These triplets serve as small building blocks, i. e. motifs in protein networks; they allow us to infer the function of uncharacterized members, and facilitate analysis of the local structure and global organization of the protein network. Our method is extendable to N-component clusters, and therefore serves as a general tool for high order protein function annotation.
    Download PDF (3750K)
  • Takuji Yamada, Shuichi Kawashima, Hiroshi Mamitsuka, Susumu Goto, Mino ...
    2005 Volume 16 Issue 1 Pages 150-158
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The lethality of a gene is a fundamental and representative measure for understanding the function of a gene and its associated biosystems. Recently, many research groups have started focusing on the concept of synthetic lethality. The synthetic lethality between genes is defined by the combination of mutations in two genes causing cell death. Here, we confirm that synthetic lethality and cellular location have close relationships among the Saccharomyces cerevisiae genes. Furthermore, we attempt the prediction of candidate gene pairs with synthetic lethality. The prediction is based on the hierarchical aspect model (HAM) which learns from a data set of cellular location to estimate a likelihood value indicating the synthetic lethality between genes.
    Download PDF (6225K)
  • Thomas Manke, Lloyd Demetrius, Martin Vingron
    2005 Volume 16 Issue 1 Pages 159-163
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We characterize protein interaction networks in terms of network entropy. This approach suggests a ranking principle, which strongly correlates with elements of functional importance, such as lethal proteins. Our combined analysis of protein interaction networks and functional profiles in single cellular yeast and mulit-cellular worm shows that proteins with large contribution to network entropy are preferentially lethal. While entropy is inherently a dynamical concept, the present analysis incorporates only structural information. Our result therefore highlights the importance of topological features, which appear as correlates of an underlying dynamical property, and which in turn determine functional traits. We argue that network entropy is a natural extension of previously studied observables, such as pathway multiplicity and centrality. It is also applicable to networks in which the processes can be quantified and therefore serves as a link to study questions of structural and dynamical robustness in a unified way.
    Download PDF (2090K)
  • Bernd Binder, Reinhart Heinrich
    2005 Volume 16 Issue 1 Pages 164-173
    Published: 2005
    Released on J-STAGE: November 16, 2011
    JOURNAL FREE ACCESS
    We analyze the structural design and the dynamical properties of a protein kinase network derived from the Transpath database [14]. We consider structural properties, such as feedback cycles, pathway lengths, fraction of shortest pathways and crosstalk. Dynamic characteristics of the network are analyzed by using nonlinear differential equations with a special focus on kinase amplitudes and signal propagation times. Comparison with random networks shows that the cellular kinase network exhibits special features which might be a result of natural selection. In particular, the Transpath network contains no cycles, and input kinases arid output kinases are generally connected by shortest signalling routes. Moreover, it displays a characteristic spectrum of cross-talk between different pathways.
    Download PDF (1085K)
  • Masashi Fujita, Minoru Kanehisa
    2005 Volume 16 Issue 1 Pages 174-181
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Thermophilic bacteria are one of the most attractive forms of life, and their adaptation mechanisms to elevated temperatures have been extensively studied over the years. Thermal adaptations cell components such as proteins and RNA are well studied, but adaptations of interactions between these components must be also vital for the thermophiles. Protein-DNA interactions play crucial roles in the cell, but little is known about their thermal adaptations. In this study, we analyzed DNA-binding proteins from thermophilic bacteria. Comparison of amino acid compositions the DNA-binding interfaces between thermophiles and their rnesophilic close relatives revealed several commonalities between phylogenetically unrelated organisms. Advantages and limitations our methods will be also discussed.
    Download PDF (3500K)
  • Yoshinori Tamada, Seiya Imoto, Kousuke Tashiro, Satoru Kuhara, Satoru ...
    2005 Volume 16 Issue 1 Pages 182-191
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We present a computational method for identifying genes and their regulatory pathways influenced by a drug, using microarray gene expression data collected by single gene disruptions and drug responses. The automatic identification of such genes and pathways in organisms' cells is an important problem for pharmacogenomics and the tailor-made medication. Our method estimates regulatory relationships between genes as a gene network from microarray data of gene disruptions with a Bayesian network model, then identifies the drug affected genes and their regulatory pathways on the estimated network with time course drug response microarray data. Compared to the existing method, our proposed method can identify not only the drug affected genes and the druggable genes, but also the drug responses of the pathways. For evaluating the proposed method, we conducted simulated examples based on artificial networks and expression data. Our method succeeded in identifying the pseudo drug affected genes and pathways with the high coverage greater than 80%. We also applied our method to Saccharomyces cerevisiae drug response microarray data. In this real example, we identified the genes and the pathways that are potentially influenced by a drug. These computational experiments indicate that our method successfully identifies the drug-activated genes and pathways, and is capable of predicting undesirable side effects of the drug, identifying novel drug target genes, and understanding the unknown mechanisms of the drug.
    Download PDF (2802K)
  • Hironori Kitakaze, Hiroshi Matsuno, Nobuhiko Ikeda, Satoru Miyano
    2005 Volume 16 Issue 1 Pages 192-202
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Living organisms have ingenious control mechanisms in which many molecular interactions work for keeping their normal activities against disturbances inside arid outside of them. However, at the same time, the control mechanism has debacle points at which the stability can be broken easily. This paper proposes a new method which uses recurrent neural network for predicting debacle points in an hybrid functional Petri net model of a biological pathway. Evaluation on an apoptosis signaling pathway indicates that the rates of 96.5% of debacle points arid 65.5% of non-debacle points can be predicted by the proposed method.
    Download PDF (1075K)
  • Oliver Ebenhöh, Thomas Handorf, Reinhart Heinrich
    2005 Volume 16 Issue 1 Pages 203-213
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We compare a large number of organisms with respect to their metabolic network functions. We measure such functions in terms of the synthesizing capacity of a network when it is provided with few small chemical substances as external resources. We call this measure the scope and show that it is generally robust against structural alterations of the reaction network. Organisms can be separated into two groups, one with a small and one with a large scope. Networks with a high synthesizing capacity also show a high degree of robustness against structural changes, indicating that this network function has been a target in the evolutionary past of the corresponding organisms. comparison between structural and functional similarities reveals that organisms with a similar structure do not necessarily show similar biological functions. The presented concepts allow for systematic investigation of structure-function relationships of metabolic networks and may put forth valuable hints on the evolution of metabolic pathways.
    Download PDF (1304K)
  • A Tool for the Analysis of Glycan
    Kosuke Hashimoto, Shin Kawano, Susumu Goto, Kiyoko F. Aoki-Kinoshita, ...
    2005 Volume 16 Issue 1 Pages 214-222
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Glycan resources have been developed of late, such as carbohydrate databases, analysis tools, and algorithms for analysis of carbohydrate features. With this background, bioinformatics approaches to carbohydrate research have recently begun using a large amount of protein and carbohydrate data. This paper introduces one of these projects that elucidates the range of carbohydrate structures.
    In this study, the variety of carbohydrate structures have been enumerated in a global tree structure called variation trees, using the KEGG GLYCAN database, which is a public-domain glycan resource for bioinformatics analysis. Additionally, a glycosyltransferase mapping list of glycosyltransferases and their catalyzing glycosidic linkages was constructed. From this, we present the composite structure map (CSM), which is a structural variation map integrating its variation trees and glycosyltransferase map list. CSM is able to display, for example, expression data of glycosyltransferases in a compact manner, illustrating its versatility as a new bioinformatics resource and tool capable of analyzing carbohydrate structures on a global scale. These resources are available at http://www.genome.jp/kegg/glycan/.
    Download PDF (880K)
  • Christoph Gille, Sabrina Hoffmann, Hermann-Georg Holzhütter
    2005 Volume 16 Issue 1 Pages 223-232
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The architecture of the cellular metabolic network is almost completely available from several databases. This has paved the way for computational analyses. Whereas kinetic modelling is still restrained to small metabolic sub-systems for which enzyme-kinetic details are known, so-called structural modelling techniques can be applied to complete metabolic networks even if the kinetics and regulation of the underlying enzymes is still unknown. Structural modelling requires detailed information on the presence of metabolic enzymes in a specific cell type of interest and the thermodynamics of the reactions, determining their direction under cellular conditions. If compartments are distinguished the sub-cellular compartmentation of reactions and enzymes and the membrane transporters exchanging metabolites between cellular compartments must be included. All this information cannot be taken from a single data base but has to be compiled from various Bioinformatics resources. Here we present an approach towards the organization of Bioinformatics data that enables the flux-balance analysis of comprehensive compartmentalized metabolic networks of eukaryotic cells with special focus on human hepatocytes.
    Download PDF (8738K)
  • Gyan Bhanot, Gabriela Alexe, Arnold J. Levine, Gustavo Stolovitzky
    2005 Volume 16 Issue 1 Pages 233-244
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    A major challenge in cancer diagnosis from microarray data is the need for robust, accurate, classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose such a classification scheme originally developed for phenotype identification from mass spectrornetry data. The method uses a robust multivariate gene selection procedure and combines the results of several machine learning tools trained on raw and pattern data to produce an accurate meta-classifier. We illustrate and validate our method by applying it to gene expression datasets: the oligonucleotide HuGeneFL microarray dataset of Shipp et al.(www.genome.wi.mit.du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our pattern-based meta-classification technique achieves higher predictive accuracies than each of the individual classifiers, is robust against data perturbations and provides subsets of related predictive genes. Our techniques predict that combinations of some genes in the p53 pathway are highly predictive of phenotype. In particular, we find that in 80% of DLBCL cases the mRNA level of at least one of the three genes p53, PLK1 and CDK2 is elevated, while in 80% of FL cases, the mRNA level of at most one of them is elevated.
    Download PDF (1396K)
  • Gul S. Daiglu, Charles De Lisi
    2005 Volume 16 Issue 1 Pages 245-253
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    High-throughput gene expression profiling can identify sets of genes that are differentially expressed between different phenotypes. Discovering marker genes is particularly important in diagnosis of a cancer phenotype. However, gene sets produced to date are too large to be economically viable diagnostics. We use a hybrid decision tree-discriminant analysis to identify small sets of genes, i. e. single genes and gene pairs, which separate normal samples from different stages of tumor samples. Half the samples are selected for training to form the probability distribution of expression values of each gene. The distributions for the tumor and normal phenotypes are then used to classify the test samples. The algorithm also identifies gene pairs by combining the probability distributions to construct a decision tree which is used to determine the class of test samples. After a series of training and testing sessions, genes and gene pairs that classify all samples correctly are recorded. The method was applied to a breast cancer data; and classifier genes that distinguish normal breast from different stages of breast tumor were identified. The genes were ranked according to their minimum Euclidean distance between the expression values in tumor and normal samples. The algorithm was able to pick known cancer related genes but also find genes that were not identified as differentially expressed by t-test with a 2 fold cut-off. Overall, the method generates possible diagnostic genes and gene pairs for a specific disease phenotype to pursue further biological interpretations in cancer biology.
    Download PDF (900K)
  • Yaoyu E Wang, Chao Zhang, Jay Berzofsky, Charles De Lisi
    2005 Volume 16 Issue 1 Pages 254-261
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The common consideration in approaching protection against HIV, whether by a vaccine or therapeutic, is identification of suitable targets. Among the central criteria for suitability is target stability; i. e. resistance to mutation. In this paper we address the problem of stability, and develop methods for identifying stable targets. The targets that we focus on are structures formed by viral peptides and products of the class I major histocompatibility complex, the target of the immune system. The method mines the large databases of fully sequenced HIV genomes and MHC binding peptides, and takes account of human polymorphism to construct hundreds of subpopulation specific stable targets, each consisting of combinations of 3-5 complexes.
    Download PDF (2384K)
  • Insights through Modelling
    Branka Cajavec, Samuel Bernard
    2005 Volume 16 Issue 1 Pages 262-271
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Huntington's Disease (HD) is a late-onset, progressively degenerative brain disorder characterized by cell loss in the striatum and cortex. HD is caused by a polyglutamine (polyQ) expansion in the protein huntingtin (Htt). The mutant Htt is a substrate for caspases -2, -3 and -6. The cleavage of mutant Htt by caspase-2 has been suggested to underlie the selective neuronal death in HD. Once the mutant Htt is cleaved, a sticky and toxic fragment with the potential to form aggregates is released. The role of aggregation in the progression of HD has been extensively studied, yielding a plethora of ambivalent results. It has been shown that these are the diffuse, monomeric and oligomeric, forms of the mutant Htt fragment rather than the aggregates that are the major source of toxicity to the cells. We present here a mathematical model for aggregation in HD and discuss how it can relate to the selective neuronal death and the dependence of the disease onset on polyQ length. We describe the dynamical behavior of caspase-2, the release of monomeric forms of the mutant Htt fragment and the aggregation of these fragments through intermediate steps. Our model predicts that the concentration of toxic, intermediate oligomeric structures does not increase with increased caspase activity. We therefore suggest that the intermediate oligomeric forms of toxic Htt fragment do not account for selective and polyQ dependent neuronal death.
    Download PDF (2450K)
  • Wataru Honda, Shuichi Kawashima, Minoru Kanehisa
    2005 Volume 16 Issue 1 Pages 272-280
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The immune system plays an essential role in the defense of the host against invaders. Enormous numbers of lymphocytes are recruited and immense numbers of antibodies or cytokines are secreted in various kinds of immune response. But the system also has the possibility of being the cause of tissue injury or some kind of diseases. For example, when their functions target their host, an autoimmune disease occurs. Although the pathogenesis of various autoimmune diseases has been scrutinized intensively, there is little evidence as of yet. But it has been reported that in most of the disease subjects, a broad spectrum of antibodies recognizing components of self tissue or circulating self antigens that normally should be ignored are observed. In this study, we come to the conclusion that proteins targeted by these autoreactive antibodies share the same peptides with some kind of proteins of viruses known to infect human. This result supports the fact that viral infection is a speculative cause of the disease in some subjects.
    Download PDF (868K)
  • Reinhart Heinrich, Charles De Lisi, Minoru Kanehisa
    2005 Volume 16 Issue 1 Pages v
    Published: 2005
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (117K)
feedback
Top