Genome Informatics

Simple Maximum Likelihood Methods for the Optical Mapping Problem

Vlado Dancik, Michael S. Waterman

1997Volume 8 Pages 1-8
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.1

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently a new method for obtaining restriction maps was developed by David Schwartz at NYU. Using this method restriction maps are created from fluorescent images of individual molecules obtained using a microscope. For every individual observed molecule, image processing methods are used to generate a list of the approximate locations of the sites where the molecule is cut by the restriction enzyme. Our task is to find the location of all restriction sites given the observed cutting sites. This is also complicated by the fact that an orientation of the molecules is unknown, i.e. for a cut-site x we do not know whether x or 1-x corresponds to a restriction site in a unit length molecule.
First we consider the case that the orientation of all molecules and the number c of restriction sites are known. We suppose that for each restriction site location yi the corresponding measured cut-sites follow the normal distribution with the density function g (x;θ_j, σ_j) for some σ_j.(This means the measurement is unbiased with mean θ_j.) The observed cut-sites locations xi, …, xn then follow the mixture distribution f (x; p, θ, σ) =Σ_k_j1 pig (x;θ_j, σ_j), where σ p_j=1. Using the likelihood principle we wish to find parameters p, θ, σ that achieve the maximum of the likelihood function ∏ⁿ_i=1f (xi; p, θ, σ). In our case it is natural to assume that p₁ =…=p_k=1/k and σ₁=…=σ_k =…for a constant σ.
Frequently in the Optical Mapping there appear “false” cuts, i.e. cuts corresponding to no restriction site. In our model we accommodate false cuts by using an uniform component in the mixture distribution. We use EM algorithm and Bayes theorem for computing the maximum likelihood estimate and compare our results for the different variants of our model.
We explore how the change of the orientation of some molecules influences the maximum likelihood estimate and show that the orientation question can be in our case answered for each molecule separately. Finally we present few ideas for specifying the orientation of molecules without investigating the positions of restriction sites.

View full abstract

Download PDF (919K)
The First Laws of Genomics

Piotr P. Slonimski, M.O. Mosse, P. Golik, A. Henaut, J.L. Risler, J.P. ...

1997Volume 8 Pages 9-10
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.9

JOURNAL FREE ACCESS

Download PDF (203K)
Invited Talk

Jérôme Chailloux, S. A. GENSET

1997Volume 8 Pages 11
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.11

JOURNAL FREE ACCESS

Download PDF (15K)
The Map of the Cell is in the Chromosome

Antoine Danchin, Alain Hénaut

1997Volume 8 Pages 13-14
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.13

JOURNAL FREE ACCESS

Download PDF (177K)
Automatic Gene Recognition without Using Training Data

Kiyoshi Asai, Yutaka Ueno, Katunobu Itou, Tetsushi Yada

1997Volume 8 Pages 15-24
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.15

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new approach for gene recognition, which uses no training data for the recognizer. In this approach, we start from a simple model, which only uses the knowledge of start codons and the stop codons, then the recognition of the DNA sequences by the recognizer and the training of the parameters of the recognizer by the result of the recognition are repeated. We applied this parse and train approach to the complete genome sequence of cyanobacterium, and achieved the almost same recognition rate with the case of using the whole sequence as training data. This results open the possibility to use automatic gene annotation system inthe early stage of sequencing projects.

View full abstract

Download PDF (1008K)
Breakpoint Phylogenies

Mathieu Blanchette, Guillaume Bourque, David Sankoff

1997Volume 8 Pages 25-34
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.25

JOURNAL FREE ACCESS

Show abstractHide abstract

We describe a number of heuristics for inferring the gene orders of the hypothetical ancestral genomes in a fixed phylogeny. The optimization criterion is the minimum number of breakpoints (pairs of genes adjacent in one genome but not the other) in the gene orders of two genomes connected by an edge of the tree, summed over all edges. The key to the method is an exact solution for trees with three leaves (the median problem) based on a reduction to the Traveling Salesman Problem.

View full abstract

Download PDF (796K)
Sequencing by Hybridization with Positive Faults

Jacek Blazewicz, Piotr Formanowicz, Marta Kasprzak, Wojciech T. Markie ...

1997Volume 8 Pages 35-42
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.35

JOURNAL FREE ACCESS

Show abstractHide abstract

The paper is concerned with a computational phase of the sequencing DNA chains by hybridization. It is assumed that positive faults can occur in the hybridization experiment. An approach based on a reduction of the problem to a variant of a Selective Traveling Salesman Problem and an algorithm for solving the latter, have been proposed. The algorithm behaves extremely well, even for a fault rate exceeding 50%.

View full abstract

Download PDF (956K)
Greedy Algorithms for Finding a Small Set of Primers Satisfying Cover and Length Resolution Conditions in PCR Experiments

Koichiro Doi, Hiroshi Imai

1997Volume 8 Pages 43-52
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.43

JOURNAL FREE ACCESS

Show abstractHide abstract

Selecting a good collection of primers is very important for polymerase chain reaction (PCR) experiments. Most existing algorithms for primer selection are concerned with computing a primer pair for each DNA sequence. In generalizing the arbitrarily primed PCR, etc., to the case that all DNA sequences of target objects are already known, like about 6000 ORFs of yeast, we may design a small set of primers so that all the targets are PCR amplified and resolved electrophoretically in a series of experiments. This is quite useful because deceasing the number of primers greatly reduces the cost of experiments. Pearson et al.[7, 8] consider finding a minimum set of primers covering all given DNA sequences, but their method does not meet necessary biological conditions such as primer amplification and electrophoresis resolution.
In this paper, based on the modeling and computational complexity analysis by Doi [2], we propose algorithms for this primer selection problem. These algorithms do not necessarily minimize the number of primers, but, since basic versions of these problems are shown to be computationally intractable, especially even for approximability with the length resolution condition, this is inevitable. In the algorithms, the amplification condition by a primer pair and the length resolution condition by electrophoresis are incorporated. These algorithms are based on the theoretically well-founded greedy algorithm for the set cover in computer science. Preliminary computational results are presented to show the validity of this approach. The number of computed primers is much less than a half of the number of targets, and hence is less than one forth of the number needed in the multiplex PCR.

View full abstract

Download PDF (1149K)
Prediction of Mitochondrial Targeting Signals Using Hidden Markov Models

Yukiko Fujiwara, Minoru Asogawa, Kenta Nakai

1997Volume 8 Pages 53-60
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.53

JOURNAL FREE ACCESS

Show abstractHide abstract

The mitochondrial targeting signal (MTS) is the presequence that directs nascent proteins bearing it to mitochondria. We have developed a hidden Markov model (HMM) that represents various known sequence characteristics of MTSs, such as the length variation, amino acid composition, amphiphilicity, and consensus pattern around the cleavage site. The topology and parameters of this model are automatically determined by the iterative duplication method, in which a small fullyconnected HMM is gradually expanded by state splitting. The model can be used to predict the existence of MTSs for given amino acid sequences. Its prediction accuracy was estimated to be 86.9% using the cross validation test. Furthermore, a higher correlation was observed between the HMM score and the in vitro ATPase activity of MSF, which can be regarded as an experimental measure of signal strength, for various synthetic peptides than was observed with other methods.

View full abstract

Download PDF (893K)
Prediction of Hydrophobic Cores of Proteins Using Wavelet Analysis

Hideki Hirakawa, Satoru Kuhara

1997Volume 8 Pages 61-70
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.61

JOURNAL FREE ACCESS

Show abstractHide abstract

Information concerning the secondary structures, flexibility, epitope and hydrophobic regions of amino acid sequences can be extracted by assigning physicochemical indices to each amino acid residue, and information on structure can be derived using the sliding window averaging technique, which is in wide use for smoothing out raw functions. Wavelet analysis has shown great potential and applicability in many fields, such as astronomy, radar, earthquake prediction, and signal or image processing. This approach is efficient for removing noise from various functions. Here we employed wavelet analysis to smooth out a plot assigned to a hydrophobicity index for amino acid sequences. We then used the resulting function to predict hydrophobic cores in globular proteins. We calculated the prediction accuracy for the hydrophobic cores of 88 representative set of proteins. Use of wavelet analysis made feasible the prediction of hydrophobic cores at 6.13% greater accuracy than the sliding window averaging technique.

View full abstract

Download PDF (1217K)
Rhythms Emerge in a Collection of ‘Blind’ Chemicals by the Use of ‘Genetic Switches’

Hiroaki Inayoshi, Hitoshi Iba

1997Volume 8 Pages 71-79
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.71

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a new computational method in the modeling and simulation of gene expression by introducing the artificial chemical system. The artificial chemical system is specified by its four items:(1) components (five kinds of particles and DNA with Genetic Switches);(2) space (2-dimensional polar grids);(3) simple reaction rules (construction and destruction of molecules, etc.);(4) simple behavioral rules (stochastic movements and stochastic collisions, etc.). The simulation demonstrates the capability of the system to exhibit emergent behavior: that is, global order of the system (regular rhythms, i.e. regular oscillations in the amounts of some gene products, in this case) emerges out of the randomness (through stochastic movements and collisions) of the components.

View full abstract

Download PDF (673K)
Beyond Mutation Matrices: Physical-Chemistry Based Evolutionary Models

Jeffrey M. Koshi, David P. Mindell, Richard A. Goldstein

1997Volume 8 Pages 80-89
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.80

JOURNAL FREE ACCESS

Show abstractHide abstract

We describe a model for characterizing site mutations in evolving proteins. By representing the fitness of each of the amino acids as a function of the physical-chemical properties of that amino acid, and constructing mutation matrices based on Boltzmann statistics and Metropolis kinetics, we are able to greatly reduce the number of adjustable parameters. This allows us to include site heterogeneity in the model, as well as to optimize the model for specific protein types. We demonstrate the applicability of the model by investigating the phylogenetic relationship between various subtypes of HIV-1.

View full abstract

Download PDF (1291K)
A New Method for Database Searching and Clustering

Antje Krause, Martin Vingron

1997Volume 8 Pages 90-99
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.90

JOURNAL FREE ACCESS

Show abstractHide abstract

An iterative database searching method is introduced and applied to the design of a database clustering procedure. The search method virtually never produces false positive hits while determining meaningfully large sets of sequences related to the query. A novel set-theoretic database clustering algorithm exploits this feature and avoids a traditional, distance-based clustering step. This makes it fast and applicable to data-sets of the size of, e. g., the Swiss-Prot database. In practice we achieve unambiguous assignment of 80% of Swiss-Prot sequences to non-overlapping sequence clusters in an entirely automatic fashion.

View full abstract

Download PDF (1342K)
Applying an Association Rule Discovery Algorithm to Multipoint Linkage Analysis

Nobutaka Mitsuhashi, Haretsugu Hishigaki, Toshihisa Takagi

1997Volume 8 Pages 100-109
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.100

JOURNAL FREE ACCESS

Show abstractHide abstract

Knowledge discovery in large databases (KDD) is being performed in several application domains, for example, t he analysis of sales data, and is expectedt o be appliedt o other domains. We propose a KDD approach to multipoint linkage analysis, which is a way of ordering loci on a chromosome. S trict multipointl inkagea nalysis basedo n maximuml ikelihoode stimationi s a computationally tough problem. So far various kinds of approximate methods have been implemented. Our method based on the discoveryo f associationb etweeng enetic recombinationsis so different from others that it is useful to recheck the result of them. In this paper, we describe how to apply thef rameworko f associationr ule discoveryt o linkagea nalysis, and also discusst hat filteringi nput data and interpretation of discoveredr ules after data mining are practicallyi mportant as well as data mining process itself.

View full abstract

Download PDF (1061K)
Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family

Pedro Romero, Zoran Obradovic, A.Keith Dunker

1997Volume 8 Pages 110-124
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.110

JOURNAL FREE ACCESS

Show abstractHide abstract

Our recently reported results [14, 29, 30] provide strong support for a hypothesis that some aminoacid sequences code for disordered regions rather than structured ones and that such disordered regions are commonly involved in function. General and family-specific neural network predictors developed in those previous studies suggest that different classes of disordered regions exist. Here, family-specific data preprocessing for disorder prediction in the calcineurin (CaN) family is explored. The results show that prediction of order and disorder on CaN sequence data benefits significantly from the use of family-specific preprocessing, with feature extraction through principal components analysis (PCA) outperforming feature selection techniques, although all methods do a good job of discriminating CaN-specific disordered regions from CaN-specific ordered regions. On the other hand, for the discrimination of CaN-specific disordered regions from general (unrelated to CaN) ordered regions, feature selection approaches proved to be more appropriate than PCA. The results further support a hypothesis that different kinds of disordered regions exist, as all family-specific disorder predictors developed in this study significantly outperformed a previously reported general multi family disorder predictor.

View full abstract

Download PDF (1915K)
Clustering Molecular Sequences with Their Components

Sivasundaram Suharnan, Takeshi Itoh, Hideo Matsuda, Hirotada Mori

1997Volume 8 Pages 125-134
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.125

JOURNAL FREE ACCESS

Show abstractHide abstract

Motivation: Several methods in genetic information have recently been developed to estimate classification of protein sequences through their sequence similarity. These methods are essential for understanding the function of predicted open reading frames (ORFs) and their molecular evolutionary processes. However, since many protein sequences consist of a number of independently evolved structural units (we refer to these units as components), the combinatorial nature of the components makes it difficult to classify the sequences.
Results: This paper presents a new method for classifying uncharacterized protein sequences. As the measure of sequence similarity, we use similarity score computed by a method based on the Smith-Waterman local alignment algorithm. Here we introduce how this method cope when sequences have multi-component structure. This method was applied to predicted ORFs on the Escherichia coli genome and we discuss the algorithm and experimental results.

View full abstract

Download PDF (1043K)
DNAinsight: An Image Processing System for 2-D Gel Electrophoresis of Genomic DNA

Katsutoshi Takahashi, Masayuki Nakazawa, Yasuo Watanabe

1997Volume 8 Pages 135-146
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.135

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a powerful image processing system DNAinsight, which performs automated detection of several thousands of spots found on autoradiogram images obtained with 2-D gel electrophoresis of genomic DNA. Algorithms and parameters for detecting spot locations and intensities are carefully chosen so as to enable reliable and rapid processing of 2-D gel electrophoretograms based on the RLGS (restriction landmark genomic scanning) method. In DNAinsight, matching of several related spot patterns, such as those from tumor-cell and normal-cell, can be accomplished rapidly with easy operations, being solved by comparing the Delaunay net and relative neighborhood graph. The automated and accurate image processing system strongly supports the rapid identification and analysis of genetic variation in the DNA of humans and other animals.

View full abstract

Download PDF (2824K)
E-CELL: Software Environment for Whole Cell Simulation

Masaru Tomita, Tom Shimizu, Kanako Saito, J. Craig Venter, Kenta Hashi ...

1997Volume 8 Pages 147-155
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.147

JOURNAL FREE ACCESS

Show abstractHide abstract

We present E-CELL, a generic computer software environment for modeling a cell and conducting experiments in silico. The E-CELL system allows a user to define functions of proteins, protein-protein interactions, protein-DNA interactions, regulation of gene expression and other features of cellular metabolism, in terms of a set of reaction rules. The system then executes those reactions iteratively, and the user can observe, through a computer display, dynamic changes in concentrations of proteins, protein complexes and other chemical compounds in the cell.
Using this software, we constructed a model of a hypothetical cell with only 127 genes sufficient for transcription, translation, energy production and phospholipid synthesis. Most of the genes are taken from Mycoplasma genitalium, the organism having the smallest known chromosome, whose complete 580kb genome sequence was determined at TIGR in 1995.
We discuss future applications of the E-CELL system with special respect to genome engineering.

View full abstract

Download PDF (1313K)
A Multi-Agent System for Exon Prediction in Human Sequences

Laurence Vignal, Frédérique Lisacek

1997Volume 8 Pages 156-165
Published: 1997
Released on J-STAGE: November 16, 2011

DOIhttps://doi.org/10.11234/gi1990.8.156

JOURNAL FREE ACCESS

Show abstractHide abstract

Given the problem of identifying exons in new genomic DNA, the sketch of a resolution process was drawn using sequence data and models of site/signal recognition. A multi-agent architecture is used to validate these models and test hypotheses on the chronology of events involved in gene splicing. Information is channelled through a hierarchy of agents. Each type of agent is the result of a successful step in the resolution process. The system does not rely on the compositional bias of coding sequences which is a key feature of current computer methods.

View full abstract

Download PDF (1244K)
Virgil: A Databank of Links between GDB and GenBank

Frédéric Achard, Emmanuel Barillot, Gis Infobiogen

1997Volume 8 Pages 166-172
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.166

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper focuses on a specific type of information frequently used by researchers in Genetics: links between genome objects. It emphasizes the fact that, at present, links are not sufficiently characterized and describes our work to address this problem: the design of a prototype databank to store links between genome databases. Because this global repository is of concern for many people, we welcome and encourage feedback from the community.

View full abstract

Download PDF (848K)
DP Algorithms for RNA Secondary Structure Prediction with Pseudoknots

Tatsuya Akutsu

1997Volume 8 Pages 173-179
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.173

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes simple DP (dynamic programming) algorithms for RNA secondary structure prediction with pseudoknots, for which no explicit DP algorithm had been known. Results of preliminary computational experiments are described too.

View full abstract

Download PDF (695K)
NP-Hardness Results for Protein Side-chain Packing

Tatsuya Akutsu

1997Volume 8 Pages 180-186
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.180

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper shows that the problem of finding a protein side-chain packing is computationally hard (NP-hard), where the problem is defined here as a combinatorial search problem using rotamer library. Although this result does not suggest a new method, it gives a justification for previous methods using such heuristics as simulated annealing, neural networks, genetic algorithms, and Gibbs sampling.

View full abstract

Download PDF (761K)
A Novel Approach Towards a Comprehensive Consensus Representation of the Expressed Human Genome

Winston Hide, John Burke, Alan Christoffels, Robert Miller

1997Volume 8 Pages 187-196
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.187

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to provided a novel maximised approach to the generation of accurate, comprehensive, consensus sequences of the expressed human genome, we have developed and produced a system for a novel-representation, broad gene coverage, consensus database of expressed human gene fragments (ESTs). To perform clustering of ESTs, we have developed and employed D2-cluster, an algorithm based on the d2-search algorithm (Hide et al. 1994) specifically for EST clustering. D2-cluster does not require alignment in order to perform clustering (Burke, Davison and Hide, in prep). We have incorporated d2-cluster into a portable and novel system to perform clustering, alignment and automated error analysis of publicly available expressed sequence tags (STACKIPACK). The system includes a statistically robust algorithm that can detect and compensate for error within an aligned cluster of ESTs. We have manufactured a database of partial human consensus sequences from 552 013 ESTs from dbEST 040896 and TIGR. The database is termed Sequence Tag Alignment and Consensus Knowledgebase (STACK). STACK 1.0 contains 18 divisions based on tissue annotation identifying 204 431 unique sequences and generating 76 131 consensi which represent 321 134 ESTs. The consensus sequences have an average length of 497 bases, a 39% increase over the 357 base average length of the input data set. Clone Ids are used to join 92 759 unique sequences and 48 858 consensi into 61 632 linked sequences, averaging 900 bases each. The distribution of clusters compares favourably with UniGene, reflecting the difference in methodology of clustering and the higher input number of sequences into STACK. SANIGENE high accuracy database is also generated, consisting of sequences which agree in at least two ESTs. STACK is a distributable, core information resource upon which a comprehensive knowledgebase can be built.

View full abstract

Download PDF (1146K)
GeneMark-RC, a Recursive Procedure for Gene Identification in the Genomic Sequence Data with Self-Consistency Evaluation; Its Application to the Analysis of Several Prokaryotic Genomes

Makoto Hirosawa, Katsumi Isono

1997Volume 8 Pages 197-206
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.197

JOURNAL FREE ACCESS

Show abstractHide abstract

Previously, we developed a GeneMark-based procedure, termed GeneMark-RC, and applied it for the identification and classification of ORFs in genomic sequence data, and identified and characterized ORFs in the 1.0 Mb data of the cyanobacterium Synechocystis sp. strain PCC 6803. In the present study, we have improved the procedure and performed analysis of the whole genomic data of Synechocystis. Consequently, we noticed the presence of three distinct classes of ORFs in this organism. The prediction of ORFs by the class-specific GeneMark-RC analysis agreed with 97.9% of those described for this bacterium. Moreover, 124 additional ORFs were identified. The procedure was similarly applied to the genomic analysis of five other prokaryotes, and 2 to 3 classes of ORFs were recognized in each case. Common features were found among the ORFs identified in the six organisms including Synechocystis. Class 1 is composed of most typical ORFs whose GC content is slightly higher than the average, while Class 2 is composed of ORFs with GC contents lower than the average. It was found that ORFs of one species can be detected with the GeneMark-RC parameters obtained from other organisms, and the prediction rate is high when the difference in their GC contents is small. It was also found that ORFs of three species with relatively low GC contents can be nicely detected with the Synechocystis matrices of Class 2 ORFs whose GC content is similar to that of the three species. Therefore, although there are two to three classes of ORFs in each species, their di-codon statistics must be rather similar to each other if their GC contents are similar. A notable exception was the case of Methanococcus jannaschii, which might reflect the fact that it is an archaebacterium.

View full abstract

Download PDF (1346K)
GENOME: A Networked Database Environment for Human Genome Data

Andreas M. Kogelnik, Shamkant B. Navathe, Douglas C. Wallace

1997Volume 8 Pages 207-214
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.207

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed the Georgia Tech Emory Networked Object Management Environment (GENOME). GENOME is a prototype database management system (DBMS)/user interface system designed to manage complex biological data, allowing users to more fully analyze and understand relationships in human genome data. The system is designed to allow the establishment of a network of searchable data sources. The DBMS portion of the environment is a hybrid object-relational system which interprets its data structures on-the-fly, resulting in an extremely flexible DBMS. Such a DBMS provides an environment for interrelating distributed data items, allowing users to further explore computational questions in biomedical science in addition to other fields by maximizing access to data. In developing GENOME, we used MITOMAP, a human mitochondrial genome database, as a model genomic database. MITOMAP encompasses one of the most complete collections of genomic data available for a specific locus or chromosome, including functional, population variation, disease mutation, and gene-gene interaction data, as well as complete sequence data for the human mitochondrial chromosome, and thus serves as an excellent model system. An effective DBMS is required for handling the plethora of Human Genome Project data to handle the various locus-specific databases and ultimately to unify all human genetic and biomedical information through the complete human genome sequence. Developing such a DBMS is our goal. We expect that GENOME will be generally applicable to other biological and non biological paradigms as well.

View full abstract

Download PDF (1249K)
Fast Discerning Repeats in DNA Sequences with a Compression Algorithm

Éric Rivals, Jean-Paul Delahaye, Max Dauchet, Olivier Delgrange

1997Volume 8 Pages 215-226
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.215

JOURNAL FREE ACCESS

Show abstractHide abstract

Long direct repeats in genomes arise from molecular duplication mechanisms like retrotransposition, copy of genes, exon shuffling, ... Their study in a given sequence reveals its internal repeat structure as well as part of its evolutionary history. Moreover, detailed knowledge about the mechanisms can be gained from a systematic investigation of repeats. The problem of finding such repeats is viewed as an NP-complete problem of the optimal compression of a sequence thanks to the encoding of its exact repeats. The repeats chosen for compression must not overlap each other as do the repeats which result from molecular duplications. We present a new heuristic algorithm, Search_Repeats, where the selection of exact repeats is guided by two biologically sound criteria: their length and the absence of overlap between those repeats. Search_Repeats detects approximate repeats, as clusters of exact sub-repeats, and points out large insertions/deletions in them. Search_Repeats takes only 3 seconds of CPU time for the genome of Haemophilus influenzae on a Sun Ultrasparc workstation.

View full abstract

Download PDF (1416K)
Reading Evolutionary History of Aminoacyl-tRNA Synthetases from Genome Sequences

Kiyotaka Shiba, Hiromi Motegi, Tetsuo Noda

1997Volume 8 Pages 227-233
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.227

JOURNAL FREE ACCESS

Show abstractHide abstract

Aminoacyl-tRNA synthetases (ARSs) are believed to have arisen early in the evolution of life as the essential components that establish the link between triplet codons and amino acids. We have cloned and sequenced eight cDNAs for human cytoplasmic ARSs. Along with twelve sequences that have been reported from other laboratories, a set of 20 human cytoplasmic ARS genes is now available. We compared these human ARSs with -400 sequences of ARS currently available from various organisms and deduced the possible evolutionary history of these enzymes. The availability of complete sets of ARSs from thirteen organisms (H. sapiens, S. cerevisiae, E. coli, H. influenzae, H. pylori, N. gonorrhoeae, S. pyogenes, M genitalium, M. pneumoniae, Synechocystis sp., M jannaschii, M. thermoautotrophicum, and A. fulgidus) made systematic analyses of the evolution of this gene family possible. In this paper, we will focus on two topics;(1) the acquisition of new structural domains to the core enzyme domains in higher eukaryotes and their possible role in the formation of multi-synthetase supra-molecular complexes, and (2) the existence of eukaryotic-like ARSs in some bacterial genomes, and the relationship of this occurrence to tRNA recognition.

View full abstract

Download PDF (667K)
3DinSight: An Integrated Database and Search Tool for Structure, Function and Property of Biomolecules

Jianghong An, Yasushi Kubota, Takao Nakama, Akinori Sarai

1997Volume 8 Pages 234-235
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.234

JOURNAL FREE ACCESS

Show abstractHide abstract

We have created an integrated database, search and visualization tool, named 3DinSight, to help researchers to get insight into the relationship of structure, function and property of biomolecules. Various kinds of searches can be carried out though WWW interfaces. The locations of motif sequences and mutations are automatically mapped on the structure, and visualized in 3D space by interactive viewers, VRML (Virtual Reality Modeling Language) and RasMol. In the case of VRML, the mapped 3D objects are hyper-linked to the corresponding document data. The amino-acid properties of structure, functional and mutation sites, can be displayed as graph plots. 3DinSight is freely accessible through the URL http://www.rtc.riken.go.jp/3DinSight.html.

View full abstract

Download PDF (286K)
Visualization of Sequence and Biological Data in DNA Data Bank of Japan

Genome Information Broker and the Enhancement of SAKURA

Kousuke Goto, Toshitsugu Okayama, Hirotada Mori, Hikaru Yamamoto, Tomo ...

1997Volume 8 Pages 236-237
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.236

JOURNAL FREE ACCESS

Show abstractHide abstract

Data submitters, reviewers and users of DNA Data Bank of Japan (DDBJ) processes sequence data longer than 1M base pairs thanks to genome projects. In order to realize smooth and reliable submission, annotation and dissemination of the large scale genetic information, DDBJ developed systems which visualize sequences and relevant biological information. A newly developed data dissemination system named Genome Information Broker and the enhancement of Web data submission system SAKURA are introduced here from the viewpoint of visualization.

View full abstract

Download PDF (218K)
Supporting Genome Information Processing by MetaCommander

Yasuhiko Kitamura, Tetsuya Nozaki, Shoji Tatsumi, Akira Tanigami

1997Volume 8 Pages 238-239
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.238

JOURNAL FREE ACCESS

Show abstractHide abstract

MetaCommander is developed as a generic tool to retrieve and integrate information from WWW servers by interpreting a script. By using MetaCommander, we can support genome information processing on WWW browsers in various ways.

View full abstract

Download PDF (210K)
A Tool on Web for Gene Regulatory Networks

Toward Software Agent for Genome Information Analysis

Hiroshi Matsuno, Manabu Hori, Nobuaki Wada, Miyako Tanaka

1997Volume 8 Pages 240-241
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.240

JOURNAL FREE ACCESS

Download PDF (179K)
Implementing Mobile Agents in Genome Information Processing

Hiroshi Matsuno, Misako Ichimura, Tatsumi Fukuyama, Miyako Tanaka

1997Volume 8 Pages 242-243
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.242

JOURNAL FREE ACCESS

Download PDF (214K)
Mutation View: A Distributed Database for Human Disease Gene Mutations

S. Minoshima, S. Mitsuyama, S. Ohno, T. Kawamura, N. Shimizu

1997Volume 8 Pages 244-245
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.244

JOURNAL FREE ACCESS

Download PDF (559K)
PDB Retriever: A Simple and Integrated Browser for Protein Data Bank

Tadashi Mizunuma, Sadahiko Misu, Motonori Ota, Ken Nishikawa

1997Volume 8 Pages 246-247
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.246

JOURNAL FREE ACCESS

Download PDF (776K)
Integrated Receptor Database

Kotoko Nakata, Takako Igarashi, Tsuguchika Kaminuma

1997Volume 8 Pages 248-249
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.248

JOURNAL FREE ACCESS

Show abstractHide abstract

A database for receptors on cell membrane has been developed. The system can collect data items such as attributes of proteins from distributed data sources on the Internet. Such sources include internationally standard biological databases such as the updated genetic database of PIR, Swiss Prot, PDB, GenBank, EMBL and GDB. The system provides various viewing tools that effectively displays different types of receptor data; DNA sequences, amino acids sequences, DNA binding sites, ligand binding sites, gene and disease information, and the protein structural information. It can also display three dimensional images using a freeware program RASMOL. DNA binding sites, ligand binding sites and active sites are classified by coloring the sequences. PDB matching sites are classified by italicization. CSNDB (Cell Signaling Networks Database), which is a database for cellular signal transduction of human is also linked in the system. The database may be useful for quick reference for ligand-membrane receptors and signal transduction in the drug design.

View full abstract

Download PDF (355K)
NEXTDB: The Expression Pattern Map Database for C. elegans

Tadasu Shin-i, Yuji Kohara

1997Volume 8 Pages 250-251
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.250

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a WWW-based database, named NEXTDB, to integrate all the information of ESTs (tag-sequences of cDNA clones) and gene expression patterns of C. elegans which are being produced and analyzed in this laboratory. NEXTDB incorporates and processes raw data of tag sequencing and classifies them into unique cDNA groups by comparing the 3'-tags. The database contains the information on map position of the cDNA groups, correspondence to predicted CDSs and homologies to other organisms' genes. NEXTDB incorporates image data of in situ hybridization which show the expression patterns of individual cDNA groups and provides us a platform for annotation of the images. The database also contains the cosmid contig maps obtained from AceDB. All of the information are linked each other in NEXTDB, which can be accessed through the internet.

View full abstract

Download PDF (297K)
An Analysis System of 2-D Gel Electrophoresis Images for Genomic Scanning

DNAinsight

Takayuki Toda, Katsutoshi Takahashi, Masayuki Nakazawa, Yasuo Watanabe

1997Volume 8 Pages 252-253
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.252

JOURNAL FREE ACCESS

Download PDF (320K)
Identification of Gene Regulatory Networks by Strategic Gene Disruptions and Gene Overexpressions

Tatsuya Akutsu, Satoru Kuhara, Osamu Maruyama, Satoru Miyano

1997Volume 8 Pages 254-255
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.254

JOURNAL FREE ACCESS

Download PDF (228K)
Development of a Spot Matching Module in Image Analysis System DDGEL for 2D Gel Electrophoresis

Tatsuya Akutsu, Akira Ohyama, Kyotetsu Kanaya, Asao Fujiyama

1997Volume 8 Pages 256-257
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.256

JOURNAL FREE ACCESS

Show abstractHide abstract

We have been developing an image analysis system named DDGEL for 2D gel electrophoresis of genomic DNA. Recently, we have developed a program module for finding a correspondence of spots between two gel electrophoresis images.

View full abstract

Download PDF (561K)
Protein Threading Using a Score Function Derived by a Linear Programming Based Method

Tatsuya Akutsu, Hiroshi Tashimo

1997Volume 8 Pages 258-259
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.258

JOURNAL FREE ACCESS

Show abstractHide abstract

We have been developing a novel method of deriving a score function for protein threading. In this method, the constraint that the score of the native threading is minimum over all possible threadings is expressed in a form of linear inequalities, and then parameters defining a score function are determined by solving these inequalities. The proposed method was evaluated using Lathrop and Smith's algorithm for finding optimal threadings and was shown to be effective for computing nearly correct threadings.

View full abstract

Download PDF (230K)
Genome Scale Prediction of Two-Component Signal Transducers from the Knowledgeof Regulatory Interactions

Hidemasa Bono, Susumu Goto, Hiroyuki Ogata, Minoru Kanehisa

1997Volume 8 Pages 260-261
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.260

JOURNAL FREE ACCESS

Show abstractHide abstract

Predicting gene functions from the whole genome sequence is an important problem in a postgenome era. We are developing a function predicting system from the whole genome sequence utilizing the functionally well annotated genome as a reference organism for the knowledge of biologically well known pathways. The databases of gene catalogs and pathways are compiled under the KEGG project. In this paper we show an instance for identifying functions of genes involved in the two-component signal transduction system.

View full abstract

Download PDF (237K)
Detection of Intron, Exon, and Intergenic DNA in Human Genome on the Basis of Quantification Method II

Hiroki Fukasawa, Shigehiko Kanaya, Yoshihiro Kudo

1997Volume 8 Pages 262-263
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.262

JOURNAL FREE ACCESS

Download PDF (203K)
Efficient Computation of Sequence Analysis in a Vector-Parallel Computer for the Study of Molecular Evolution

Takamasa Futatsuki, Yuichi Kawanishi, Kimitoshi Naito, Satoru Miyazaki ...

1997Volume 8 Pages 264-265
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.264

JOURNAL FREE ACCESS

Show abstractHide abstract

We analyzed and implemented Smith and Waterman algorithm and maximum likelihood method into the vector-parallel computer of Fujitsu VPP500. The programs optimized for the computer are ssearch, clustalw and fastDNAml. Our goal is to develop a total system which will cover all processes from database search to the construction of large scale phylogenetic trees on super-computer.

View full abstract

Download PDF (161K)
Translated Codons (Trons) Useful for Direct Matching of a Genomic DNA Sequenceand a Protein Sequence or Profil

Osamu Gotoh

1997Volume 8 Pages 266-267
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.266

JOURNAL FREE ACCESS

Download PDF (210K)
A Heuristic Algorithm for Genome Rearrangements

Qian-Ping Gu, Kazuyuki Iwata, Shietung Peng, Qi-Ming Chen

1997Volume 8 Pages 268-269
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.268

JOURNAL FREE ACCESS

Download PDF (204K)
Computer Simulation of Drosophila's Early Segmentation in Virtual Drosophila Project

Shugo Hamahashi, Hiroaki Kitano

1997Volume 8 Pages 270-271
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.270

JOURNAL FREE ACCESS

Show abstractHide abstract

Embryogenesis is one of the most important and mysterious process of animal's development. The embryogenesis is quite complex and hard to be understood because it has too many elements, such as cells or nuclei, which interact with each other. We replicated the system of Drosophila's early segmentation by using computer. Computer simulation enables us to understand a whole system of animal's development. The work reported here is an attempt to observe the mechanism of segmentation during the early development of Drosophila in detail by using computer simulation, which is a part of Virtual Drosophila project.

View full abstract

Download PDF (225K)
An Inspection of the Multiple Alignment Method with Use of a Genetic Algorithm

Yoshitomo Harada, Masato Wayama, Toshio Shimizu

1997Volume 8 Pages 272-273
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.272

JOURNAL FREE ACCESS

Download PDF (196K)
Automated Spot Matching in Autoradiogram Images of Two-Dimensional Electrophoresis of Genomic DNA

Junichi Isamikawa, Katsutoshi Takahashi, Masayuki Nakazawa, Yasuo Wata ...

1997Volume 8 Pages 274-275
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.274

JOURNAL FREE ACCESS

Download PDF (629K)
The Construction of the Knowledge Base on Apoptotic Molecular Interactions

Masahiro Hattori, Minoru Kanehisa

1997Volume 8 Pages 276-277
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.276

JOURNAL FREE ACCESS

Show abstractHide abstract

For the purpose of an analysis of apoptotic molecular interactions, we have developed the knowledge base on apoptosis, which consists of molecular interactions concerning apoptosis reported in experimental papers. We have collected about 80 entries, where one entry is corresponding to one molecule, and each entry contains their interaction information.

View full abstract

Download PDF (194K)
ALIS Sequencing Database for Large Scale Human Genome Project

Mika Hirakawa, Kensaku Imai, Hiroko Yamaguchi, Junko Shimada, Kazuo Ta ...

1997Volume 8 Pages 278-279
Published: 1997
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.8.278

JOURNAL FREE ACCESS

Show abstractHide abstract

The goal of the Advanced Life Science Information Systems (ALIS) project is construction of an entire human genome database that will provide an efficient source of information for researchers after the human genome has been sequenced. We have initiated this project to encourage large scale human genome sequencing and to develop systems for genome data management and data publishing by World Wide Web. It has been 2 years since the project began and our first attempt at human genome sequencing is going well and more than 4M bases of well-edited human genome sequences have been acquired. The human genome project is progressing and international consensus releasing data generated from the project has been defined. We have been improved on our sequencing database to adapt the situation. Recently we organized collection and publication system for the genome sequencing data.

View full abstract

Download PDF (164K)

Register with J-STAGE for free!