Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Volume 15, Issue 2
Displaying 1-33 of 33 articles from this issue
  • Yoshiyuki Kido, Susumu Date, Shingo Takeda, Shoji Hatano, Juncai Ma, S ...
    2004 Volume 15 Issue 2 Pages 3-12
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The recent advance in information technologies has bought about the borderlessness in every field of both science and business. The borderlessness has increasingly made activities in interdisciplinary field more important. This current situation produces a strong demand that people want to establish a virtual group, organization and society for their business and scientific purposes irrespective of the actual structure formed by organizations. Remarkably, bio sciences require a research platform that satisfies such demand for further development. In this paper, we present a research platform for bioinformatics in detail. The prominent feature of the research platform is the use of Grid and its location transparency, which means that bio scientists and researchers are able to utilize a large amount of computational power for their analysis and to access data of their interest without being aware of where data and computational resources are located. The usefulness and feasibility of the architecture of the research platform is shown as well as future issues to achieve toward the final goal of our research in this paper.
    Download PDF (1501K)
  • Åke Västermark, Yasumasa Shigemoto, Takashi Abe, Hideaki Su ...
    2004 Volume 15 Issue 2 Pages 13-20
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In one scenario of gene evolution, exon shuffling plays a fundamental role in increasing gene diversity. This paper is an appraisal of the biological relevance of categorising proteins by their splicing profiles (exon-intron structures). The central question is whether protein function is more correlated with splicing profiles than sequence similarity, or not. To approach this question, a splicing profile similarity (SPS) index, which measures relative exon length discrepancy, was devised. Arbitrary human proteins were compared, in terms of SPS and amino acid sequence similarity, to their 1) mouse orthologues and 2) human paralogues, which epitomise functional equivalence and non-equivalence, respectively, to methodically elucidate the global relationship between a) biological function, b) splicing profile similarity, and c) sequence similarity. Protein function is more correlated with splicing profile similarity than sequence similarity as demonstrated by the fact that human-mouse orthologues (HMOs) display significantly higher splicing profile similarity than do human-human paralogues (HHPs), despite the mutual sequence similarity between these two categories. This finding indicates that splicing profile-based protein categorisation is biologically meaningful.
    Download PDF (1023K)
  • Nikola Stojanovic
    2004 Volume 15 Issue 2 Pages 21-30
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Multiple sequence alignments are a powerful tool for identifying the regions of DNA which have been constrained in evolutionary divergence, presumably due to their functional role. However, such constraints rarely manifest themselves as perfect conservation of a site clearly standing out in its broader environment, as they reflect the species-specific differences in proteins, as well as the ability of some proteins to interact with multiple variants of their binding sequence. In this paper we explore the use of alignment column uncertainty as an aid in locating differential phylogenetic footprints, which refer to the sites in DNA where groups of related species exhibit sequence conservation, but where the pattern may vary between the groups. We use efficient, linear-time algorithms to locate such sites. We have performed a study of the mammalian CAV2-CAV1 gene region using our software, and we conclude with several observations concerning the differential conservation and the use of computational methods for its detection. The software developed for this project is available, free of charge, by contacting the author.
    Download PDF (1776K)
  • Jianghong An, Maxim Totrov, Ruben Abagyan
    2004 Volume 15 Issue 2 Pages 31-41
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We have developed a new computational algorithm for de novo identification of protein-ligand binding pockets and performed a large-scale validation of the algorithm on two systematically collected datasets from all crystallographic structures in the Protein Data Bank (PDB). This algorithm, called DrugSite, takes a three-dimensional protein structure as input and returns the location, volume and shape of the putative small molecule binding sites by using a physical potential and without any knowledge about a potential ligand molecule. We validated this method using 17, 126 binding sites from complexes and apo-structures from the PDB. Out of 5, 616 binding sites from protein-ligand complexes, 98.8% were identified by predicted pockets. In proteins having known binding sites, 80.9% were predicted by the largest predicted pocket and 92.7% by the first two. The average ratio of predicted contact area to the total surface area of the protein was 4.7% for the predicted pockets. In only 1.2% of the cases, no “pocket density” was found at the ligand location. Further, 98.6% of 11, 510 binding sites collected from apo-structures were predicted. The algorithm is accurate and fast enough to predict protein-ligand binding sites of uncharacterized protein structures, suggest new allosteric druggable pockets, evaluate druggability of protein-protein interfaces and prioritize molecular targets by druggability. Furthermore, the known and the predicted binding pockets for the proteome of a particular organism can be clustered into a “pocketome”, that can be used for rapid evaluation of possible binding partners of a given chemical compound.
    Download PDF (1705K)
  • Abhijit Chattaraj, Hugh E. Williams, Adam Cannane
    2004 Volume 15 Issue 2 Pages 42-51
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Fast and accurate techniques for searching large genomic text collections are becoming increasingly important. While Information Retrieval is well-established for general-purpose text retrieval tasks, less is known about retrieval techniques for genomic text data. In this paper, we investigate and propose general-purpose search techniques for genomic text. In particular, we show that significant improvements can result from manual term expansion, where additional words are added to queries and documents. We also show that collection partitioning, where documents are included in or excluded from the search space, is highly effective for some tasks. We experiment with our techniques on four text collections and show, for example, that the collection partitioning scheme can improve effectiveness by almost 9.5% over a standard retrieval baseline. We conclude by recommending techniques that can be considered for most genomic search tasks.
    Download PDF (1402K)
  • Xiao Yang, Jagath C. Rajapakse
    2004 Volume 15 Issue 2 Pages 52-62
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We address the weak motif recognition problem in DNA sequences, which extends the general motif recognition to more difficult cases, allowing more degenerations in motif instances. Several algorithms have earlier attempted to find weak motifs in DNA sequences but with limitations. In this paper, we propose a graph-based algorithm for weak motif detection, which uses dynamic programming approach to find cliques indicating motif instances. The experiments on synthetic datasets show that the algorithm finds weak motif instances more accurately and efficiently compared to earlier approaches. Its performances on real datasets in finding transcription factor binding sites are comparable with the existing techniques.
    Download PDF (1335K)
  • Mark P Styczynski, Isidore Rigoutsos, Kyle L Jensen, Gregory N Stephan ...
    2004 Volume 15 Issue 2 Pages 63-71
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The (l, d)-motif challenge problem, as introduced by Pevzner and Sze [12], is a mathematical abstraction of the DNA functional site discovery task. Here we expand the (l, d)-motif problem to more accurately model this task and present a novel algorithm to solve this extended problem. This algorithm is guaranteed to find all (l, d)-motifs in a set of input sequences with unbounded support and length. We demonstrate the performance of the algorithm on publicly available datasets and show that the algorithm deterministically enumerates the optimal motifs.
    Download PDF (1246K)
  • Juris Viksna, David Gilbert, Gilleain Torrance
    2004 Volume 15 Issue 2 Pages 72-81
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We describe a method for automated domain discovery for topological profile searches in protein structures. The method is used in a system TOPStructure for fast prediction of CATH classification for protein structures (given as PDB files). It is important for profile searches in multi-domain proteins, for which the profile method by itself tends to perform poorly. We also present an O (C (n.) k + nk2) time algorithm for this problem, compared to the O (C (n) k + (nk)2) time used by a trivial algorithm (where n is the length of the structure, k is the number of profiles and C (n) is the time needed to check for a presence of a given motif in a structure of length n). This method has been developed and is currently used for TOPS representations of protein structures and prediction of CATH classification, but may be applied to other graph-based representations of protein or RNA structures and/or other prediction problems. A protein structure prediction system incorporating the domain discovery method is available at http://bioinf.mii.lu.lv/tops/.
    Download PDF (1344K)
  • Si Quang Le, Tu Bao Ho, T. T Hang Phan
    2004 Volume 15 Issue 2 Pages 82-91
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a graph-based method to measure the similarity between chemical compounds described by 2D form. Our main idea is to measure the similarity between two compounds based on edges, nodes, and connectivity of their common subgraphs. We applied the proposed similarity measure in combination with a clustering method to more than eleven thousand compounds in the chemical compound database KEGG/LIGAND and discovered that compound clusters with highly similar structure compounds that share common names, take part in the same pathways, and have the same requirement of enzymes in reactions. Furthermore, we discovered the surprising sameness between pathway modules identified by clusters of similar structure compounds and that identified by genomic contexts, namely, operon structures of enzyme genes.
    Download PDF (1073K)
  • Dmitri D. Pervouchine
    2004 Volume 15 Issue 2 Pages 92-101
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Here we present IRIS, a method for prediction of RNA-RNA interactions that is based on dynamic programming and extends current RNA secondary structure prediction approaches. Using this method we have found a number of interesting refinements to the structures of RNA-RNA complexes that have been studied previously and predicted novel targets for several known regulatory RNAs in E. coli. The computational time and memory usage of IRIS are O(n3m3) and O (n2m2), respectively, where n and m are the lengths of the input sequences. IRIS can be used for analysis of antisense regulatory systems in sequenced organisms and for the design of artificial riboregulators such as antisense drugs.
    Download PDF (1167K)
  • Qian Yang, Mathieu Blanchette
    2004 Volume 15 Issue 2 Pages 102-111
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Functional RNA molecules typically have structural patterns that are highly conserved in evolution. Here we present an algorithmic method for multiple alignment of RNAs, taking into consideration both structural similarity and sequence identity. Furthermore, our window-sized comparative analysis corrects the misaligned structure within a distance threshold and identifies the conserved substructures. Based on this new algorithm, StructMiner outperforms existing approaches, which ignore structure information for the alignment and lack the effective means to adjust the misalignments in the analysis phase. In addition, StructMiner is efficient in terms of CPU time and memory usage, making it suitable for structural analysis of very long sequences.
    Download PDF (1371K)
  • (1) Assessing RNA 3D Structure Similarity from 2D Structure Similarity
    Jaime E. Barreda DC, Yoshimitsu Shigenobu, Eiichiro Ichiishi, Carlos A ...
    2004 Volume 15 Issue 2 Pages 112-120
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Computational techniques for 3D structure prediction of proteins, the holy grail of bioinformatics, have undergone major developments in recent years, geared by international cooperation and competition with CASP (Critical Assessment of Structure Prediction Techniques) like contests to improve and refine them. Although straightforward extrapolation of these methodologies for the prediction of the 3D structures of other similarly relevant bio macromolecules may not be too compelling due mostly to the intrinsic differences in constitution, nature, and function between them, the conceptual framework underlying most of those techniques applied to the development of similar computational techniques in structural biology can lead to efficient systems for prediction of the 3D structure of other bio-macromolecules. One of them is the development of rational methodologies to model RNA 3D structures from the sequence of nucleotides composing them. In this paper we establish the fundamentals of a methodology to thread a sequence of nucleotides into a set of 3D fragments extracted from a data base expressly developed for this purpose. The technique is based on a newly implemented algorithm for extraction of 3D fragments by comparison of secondary structures of RNA. The result is a highly efficient system to produce a set of fragments from which entire RNA structure for the given nucleotide sequence can be built.
    Download PDF (1528K)
  • Naoya Sugimoto, Hitoshi Iba
    2004 Volume 15 Issue 2 Pages 121-130
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We propose a dynamic differential Bayesian networks (DDBNs) and nonparametric regression model. This model is an extended model of traditional dynamic Bayesian networks (DBNs), which can incorporate temporal information in a natural way and directly handle real-valued data obtained from microarrays without any transformation. In addition, it can cope with differential information between gene expression levels, without any loss to the traditional advantage, i.e., the capability of estimating non-linear relationships between genes. We apply DDBNs to analyze simulated data and real data, i. e., Saccharomyces cerevisiae cell cycle gene expression data. We have confirmed the effectiveness of our approach in the sense that some edges have been successfully detected only by DDBNs, not by DBNs.
    Download PDF (999K)
  • Fan Li, Yiming Yang
    2004 Volume 15 Issue 2 Pages 131-140
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Learning large network (with hundreds of variables) is gaining interest of many researchers with the emergence of high-throughput biological data sources such as micro-array data. In this paper, we investigated the two popular large scale network structure learning algorithms, sparse candidate hill climbing (SCHC) and Grow-Shrinkage (GS) algorithm. The experiments show that in fact both of them have serious effectiveness problems when the number of variables (genes) is large compared to the number of instances (experimental conditions), which is a common case in micro-array data. We further propose a new large scale structure learning algorithm based on Lasso regression. Theoretical analysis in [10] suggested that the L1-norm in lasso regression could make our algorithm especially suitable in the cases that the number of variables and instances is unbalance. Our algorithm achieves much better results than SCHC and GS on the synthetic data. We also show the effectiveness of our algorithm by learning genetic regulatory network modules from a real micro-array data (with more than 6000 genes), combined with the genome-wide location analysis data. The learned results are consistent well with biological knowledge.
    Download PDF (1274K)
  • Annika Hansen, Sascha Ott, Georgy Koentges
    2004 Volume 15 Issue 2 Pages 141-150
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Disentangling networks of regulation of gene expression is a major challenge in the field of computational biology. Harvesting the information contained in microarray data sets is a promising approach towards this challenge. We propose an algorithm for the optimal estimation of Bayesian networks from microarray data, which reduces the CPU time and memory consumption of previous algorithms. We prove that the space complexity can be reduced from O (n2·2n) to O (2n), and that the expected calculation time can be reduced from O (n2·2n) to O (n·2n), where n is the number of genes. We make intrinsic use of a limitation of the maximal number of regulators of each gene, which has biological as well as statistical justifications. The improvements are significant for some applications in research.
    Download PDF (986K)
  • Shigeto Seno, Reiji Teramoto, Yoichi Takenaka, Hideo Matsuda
    2004 Volume 15 Issue 2 Pages 151-160
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Recently, gene expression data under various conditions have largely been obtained by the utilization of the DNA microarrays and oligonucleotide arrays. There have been emerging demands to analyze the function of genes from the gene expression profiles. For clustering genes from their expression profiles, hierarchical clustering has been widely used. The clustering method represents the relationships of genes as a tree structure by connecting genes using their similarity scores based on the Pearson correlation coefficient. But the clustering method is sensitive to experimental noise.
    To cope with the problem, we propose another type of clustering method (the p-quasi complete linkage clustering). We apply this method to the gene expression data of yeast cell-cycles and human lung cancer. The effectiveness of our method is demonstrated by comparing clustering results with other methods.
    Download PDF (1256K)
  • Hiroyuki Kurata, Natsumi Shimizu, Kanako Misumi
    2004 Volume 15 Issue 2 Pages 161-170
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    A goal of systems biology is to build a concrete biochemical network map, which provides an important instruction to trace the pathways of interest or to understand the mechanism of a biological system. In the postgenomic era, not only the concrete biochemical maps, but also postgenomic maps (mRNA coexpression and protein-protein interaction networks) have been extensively produced. In the biochemical map, the individual reactions are reliable, but the number of the reactions is limited, because molecular biology requires extensive experiments to verify them. By contrast, postgenomic data provide much information regarding interactions, but are coarse-grained. To expand the biochemical network, an intuitional approach, which superposes postgenomic data on the map one by one, has been carried out, but it is not effective when a large amount of the coarse-grained data is handled. In order to effectively integrate such postgenomic interactions into a biochemical map, a statistical approach would be suitable rather than intuition. In this article, we proposed a novel statistical approach that integrates postgenomic interaction networks into the biochemical network, predicting novel pathways. A statistical correlation for such different types of networks identifies functional modules; subsequently the superposition of the different networks on the functional modules predicts inter-modular relations, which are the key pathways to construct a large-scale biochemical network.
    Download PDF (1656K)
  • Dong-Soo Han, Hong-Soog Kim, Woo-Hyuk Jang, Sung-Doke Lee, Jung Keun S ...
    2004 Volume 15 Issue 2 Pages 171-180
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    With the recognition of the importance of computational approach for protein-protein interaction prediction, many techniques have been developed to computationally predict protein-protein interactions. However, few techniques are actually implemented and announced in service form for general users to readily access and use the techniques. In this paper, we design and implement a protein interaction prediction service system based on the domain combination based protein-protein interaction prediction technique, which is known to show superior accuracy to other conventional computational protein-protein interaction prediction methods. In the prediction accuracy test of the method, high sensitivity (77%) and specificity (95%) are achieved for test protein pairs containing common domains with learning sets of proteins in a Yeast. The stability of the method is also manifested through the testing over DIP CORE, HMS-PCI, and TAP data. The functions of the system are divided into core, subsidiary, and general service function categories. The core function category includes the functions that can be provided only by using the domain combination based protein-protein interaction prediction method. Interaction prediction for a single protein pair and visualization of interaction probability distributions are the functions in this category. The subsidiary function category includes the functions that can be derived from the core functions. Domain combination pair search with high appearance probability and construction of protein interaction network are the functions in this category. Lastly, the general service function category includes the functions that can be implemented by collecting and organizing the protein and domain data in the Internet. Performance, openness and flexibility are the major design goals and they are achieved by adopting parallel execution techniques, Web Services standards, and layered architecture respectively. In this paper, several representative user interfaces of the system are also introduced with comprehensive usage guides.
    Download PDF (1451K)
  • Long-Hui Wang, Juan Liu, Yan-Fu Li, Huai-Bei Zhou
    2004 Volume 15 Issue 2 Pages 181-190
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Protein structure prediction is one of the most important problems in modern computational biology. Protein secondary structure prediction is a key step in prediction of protein tertiary structure. There have emerged many methods based on machine learning techniques, such as neural networks (NN) and support vector machine (SVM) etc., to focus on the prediction of the secondary structures. In this paper, a new method was proposed based on SVM. Different from the existing methods, this method takes into account of the physical-chemical properties and structure properties of amino acids. When tested on the most popular dataset CB513, it achieved a Q3 accuracy of 0.7844, which illustrates that it is one of the top range methods for protein of secondary structure prediction.
    Download PDF (1135K)
  • Steven Busuttil, John Abela, Gordon J. Pace
    2004 Volume 15 Issue 2 Pages 191-200
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Two new techniques for remote protein homology detection particulary suited for sparse data are introduced. These methods are based on position specific scoring matrices or profiles and use a support vector machine (SVM) for discrimination. The performance on standard benchmarks outperforms previous non-discriminative techniques and is comparable to that of other SVM-based methods while giving distinct advantages.
    Download PDF (1341K)
  • Tomohiro Ando, Seiya Imoto, Satoru Miyano
    2004 Volume 15 Issue 2 Pages 201-210
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    One important application of microarray gene expression data is to study the relationship between the clinical phenotype of cancer patients and gene expression profiles on the whole-genome scale. The clinical phenotype includes several different types of cancers, survival times, relapse times, drug responses and so on. Under the situation that the subtypes of cancer have not been previously identified or known to exist, we develop a new kernel mixture modeling method that performs simultaneously identification of the subtype of cancer, prediction of theprobabilities of both cancer type and patient's survival, and detection of a set of marker genes onwhich to base a diagnosis. The proposed method is successfully performed on real data analysis and simulation studies.
    Download PDF (1357K)
  • Xin Chen, Zhengchang Su, Ying Xu, Tao Jiang
    2004 Volume 15 Issue 2 Pages 211-222
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We computationally predict operons in the Synechococcus sp. WH8102 genome based on three types of genomic data: intergenic distances, COG gene functions and phylogenetic profiles. In the proposed method, we first estimate a log-likelihood distribution for each type of genomic data, and then fuse these distribution information by a perceptron to discriminate pairs of genes within operons (WO pairs) from those across transcription unit borders (TUB pairs). Computational experiments demonstrated that WO pairs tend to have shorter intergenic distances, a higher probability being in the same COG functional categories and more similar phylogenetic profiles than TUB pairs, indicating their powerful capabilities for operon prediction. By testing the method on 236 known operons of Escherichia coli K12, an overall accuracy of 83. 8% is obtained by joint learning from multiple types of genomic data, whereas individual information source yields accuracies of 80.4%, 74.4%, and 70.6% respectively.
    We have applied this new approach, in conjunction with our previous comparative genome analysis-based approach, to predict 556 (putative) operons in WH8102. All predicted data are available at (http://www. cs. ucr. edu/xin/operons. htm) for public use.
    Download PDF (1618K)
  • Tianshou Zhou, Luonan Chen, Ruiqi Wang, Kazuyuki Aihara
    2004 Volume 15 Issue 2 Pages 223-233
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    This paper investigates a general coupled noisy system for a cell-cell communication in a multicell system. The main conclusion is that appropriate noise intensity and coupling strength are capable of driving the coupled system to synchrony, which may be exploited by biological organisms to actively facilitate mutual communication. A multi-cell system with a synthetic gene network with both noises and delays is adopted to demonstrate the effect of noises on cellular communication.
    Download PDF (1365K)
  • Dong-Yup Lee, Ralf Zimmer, Sang-Yup Lee, Daniel Hanisch, Sunwon Park
    2004 Volume 15 Issue 2 Pages 234-243
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    A Petri-net based model for knowledge representation has been developed to describe as explicitly and formally as possible the molecular mechanisms of cell signaling and their pathological implications. A conceptual framework has been established for reconstructing and analyzing signal transduction networks on the basis of the formal representation. Such a conceptual framework renders it possible to qualitatively understand the cell signaling behavior at systems-level. The mechanisms of the complex signaling network are explored by applying the established framework to the signal transduction induced by potent proinflammatory cytokines, IL-1β and TNF-α The corresponding expert-knowledge network is constructed to evaluate its mechanisms in detail. This strategy should be useful in drug target discovery and its validation.
    Download PDF (1403K)
  • Claudia Cho, Torsten Crass, Alexander Kel, Olga Kel-Margoulis, Mathias ...
    2004 Volume 15 Issue 2 Pages 244-254
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The data model of the signaling pathways database TRANSPATH has been re-engineered to a three-layer model comprising experimental evidences and summarized pathway information, both in a mechanistically detailed manner, and a “semantic” projection for the abstract overview. Each molecule is described in the context of a certain reaction in the multidimensional space of posttranslational modification, molecular family relationships, and the biological species of its origin. The new model makes the data better suitable for reconstructing signaling pathways and networks and mapping expression data, for instance from microarray experiments, onto regulatory networks.
    Download PDF (1537K)
  • Takako Takai(Igarashi), Riichiro Mizoguchi
    2004 Volume 15 Issue 2 Pages 255-265
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Databases have collected masses of information concerning cell signaling pathways that includes information on pathways, molecular interactions as well as molecular complexes. However we have no general data model to represent comprehensive properties of cell signaling pathways, so that this type of information has been represented by two different data models that we call ‘binary relation’ and ‘state transition’. The disagreement between the existing models derives from lack of consensus about a factor of causality in reactions in cell signaling pathways, which is often called ‘signal’. We developed an ontology named CSNO (Cell Signaling Networks Ontology) based on device ontology. As device ontology is a research product of knowledge engineering, CSNO is the first application of it to biological knowledge. CSNO defines the factor of causality called ‘signal’, offers an integrative viewpoint for the two different data models, explicates intrinsic distinctions between signaling and metabolic pathways, and eliminates ambiguity from representation of complex molecules.
    Download PDF (1615K)
  • Frédéric Nikitin, Bastien Rance, Masumi Itoh, Minoru Kan ...
    2004 Volume 15 Issue 2 Pages 266-275
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We have studied the projection of protein family data onto single bacterial translated genome as a solution to visualise relationships between families restricted to bacterial sequences. Any member of any type of family as defined in the Pfam database (domains, signatures, etc.) is considered as a protein module. Our first goal is to discover rules correlating the occurrence of modules with biochemical properties. To achieve this goal we have developed a platform to quantify information found in protein databases and to support the analysis of the nature of modules, their position and corresponding frequencies of occurrence (in isolation or in combination) in association with pathway knowledge as found in KEGG.This paper focuses on two pathways: the two-component system and the aminophosphonate metabolism, that are partially but not completely documented. Proteins involved in those pathways were listed separately in each organism to analyse module composition and rules constraining pathway interactions were identified. It is shown how these results can be used to update KEGG pathways and orthologue tables.
    Download PDF (1329K)
  • Philip Stegmaier, Alexander E. Kel, Edgar Wingender
    2004 Volume 15 Issue 2 Pages 276-286
    Published: 2004
    Released on J-STAGE: November 16, 2011
    JOURNAL FREE ACCESS
    Based on the manual annotation of transcription factors stored in the TRANSFAC database, we developed a library of hidden Markov models (HMM) to represent their DNA-binding domains and used it for a comprehensive classification. The models constructed were applied on the UniProt/Swiss-Prodt database, leading to a systematic classification of further DNA-binding protein entries. The HMM library obtained can be used to classify any newly discovered transcription factor according to its DNA-binding domain and, thus, to generate hypothesesa bout its DNA-binding specificity.
    Download PDF (1625K)
  • Tho Hoan Pham, Kenji Satou, Tu Bao Ho
    2004 Volume 15 Issue 2 Pages 287-295
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In eukaryotes, gene expression is controlled by various transcription factors that bind to the promoter regions. Transcription factors may act positively, negatively or not at all. Different combinations of them may also activate or repress gene expression, and form regulatory networks transcription. Uncovering such regulatory networks is a central challenge in genomic biology.
    In this study, we first defined a new kind of motifs in regulatory networks, transcriptional regulatory modules (TRMs), with the form factorsetgeneset, which emphasizes the combinatorial gene control of the group of factors factorset on the group of genes geneset. Second, we developed an efficient method based on a closed itemset mining technique for finding the two most informative kinds of TRMs, closed inf-TRMs and closed sup-TRMs, from factor DNA-binding sites and gene expression profiles data. The set of all closed inf-TRMs and closed sup-TRMs is often orders of magnitude smaller than the set of all TRMs but does not loss any information. When being applied to yeast data, our method produced results that are more compact, concise and comprehensive than those from previous studies to identify and interpret the transcriptional role of regulator combinations on sets of genes. Availability: Supplementary files: http://www.jaist.ac.jp/h-pham/regulation/.
    Download PDF (1174K)
  • From Vision to Blueprint
    Alessandro Sette
    2004 Volume 15 Issue 2 Pages 299
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (130K)
  • Ron Shamir
    2004 Volume 15 Issue 2 Pages 300-301
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (160K)
  • Exploring Universal Statistical and Dynamical Features in Cellular Processes
    Kunihiko Kaneko
    2004 Volume 15 Issue 2 Pages 302-303
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (143K)
  • Tatsuya Akutsu, Vladimir Brusic
    2004 Volume 15 Issue 2 Pages v
    Published: 2004
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (110K)
feedback
Top