Genome Informatics

On Pattern Matching Methods for Three Dimensional Protein Structures

Tatsuya Akutsu

1993Volume 4 Pages 1-9
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.1

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we consider the pattern matching problems for three dimensional protein structures. Especially, we consider the problems of substructure search and common substructure search. First, we show that the common substructure search problem amongst multiple protein structures is very difficult from a theoretical viewpoint of computational complexity. Next, we present two practical algorithms. One is named a least-squares hashing method and the other is named a dynamic matching method. In the least-squares hashing method, the hashing technique, which is well-known in computer science, is combined with a least-squares fitting technique. In the dynamic matching method, the dynamic programming technique, which is widely used for pattern matching of DNA and amino acid sequences, is combined with a least-squares fitting technique. These two methods have been applied to PDB (Protein Data Bank) data and shown to be effective.

View full abstract

Download PDF (809K)
Motif Knowledge Base Based on Deductive Object-Oriented Database Language

MAKOTO HIROSAWA, REIKO TANAKA, MASATO ISHIKAWA

1993Volume 4 Pages 10-16
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.10

JOURNAL FREE ACCESS

Show abstractHide abstract

The representation of biological concepts in a knowldge base are important to a machine or a non-specialist of biology to understand and analyze genetic information. In our previous study, we studied the representation of biological knowledge and the representation of biological knowledge related to motif of protein with the goal of discovering new motifs.
In this paper, firstly, the requirements for the representation of biological knowledge are listed. Then, solutions to these requirements are stated. Finally, representation of bioloigal knowledge on motif in the Deductive Object-Oriented Language, QUIXOTΣ, is shown. The knowledge base includes Prosite, a representative motif database, as the basis of the knowledge base.

View full abstract

Download PDF (714K)
Efficient Query Processing on Genomic Database using Data-Parallel Logic Programming Language

Hideo Matsuda

1993Volume 4 Pages 17-24
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.17

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a prototype system for querying genomic database based on data-parallel logic programming. The efficient access to genomic database is crucial, given enormous increase in sequence data. By using a logic programming language, the system allows a user to perform adaptable data retrieval to integrated data objects in a single declarative framework. In addition by utilizing data-parallel processing, it provides efficient access in a large amount of genomic data on distributed computing environment. We present its design principle and discuss the implementation of the database system.

View full abstract

Download PDF (710K)
Searches for Topologically and Three Dimensionally Similar Structures in Proteins

Kenji Satou, Emiko Furuichi, Satoru Kuhara, Toshihisa Takagi

1993Volume 4 Pages 25-35
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.25

JOURNAL FREE ACCESS

Show abstractHide abstract

We developed a deductive database system PACADE for analyzing three dimensional and secondary structures of protein. PACADE is equipped with a function to search for similar structures in proteins. Unlike other approaches based on calculation of the inter-atomic root mean square distance, this function is based on logic programming and source level rule rewriting techniques.
We describe herein the result of searches for topologically similar structures and three dimensionally similar ones. A user of PACADE can select these two levels of similarities by adding/deleting prefixes.

View full abstract

Download PDF (851K)
Stochastic Context-Free Grammars in Computational Biology

Applications to Modeling RNA

Yasubumi Sakakibara, Michael Brown, Rebecca C. Underwood, Saira I. Mia ...

1993Volume 4 Pages 36-45
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.36

JOURNAL FREE ACCESS

Show abstractHide abstract

Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. These models capture the common primary and secondary structure of the sequences with a context-free grammar, much like those used to define the syntax of programming languages. SCFGs generalize the hidden Markov models used in related work on protein and DNA sequences. The novel aspect of this work is that the SCFGs developed here are learned automatically from initially unaligned and unfolded training sequences. To do this, a new generalization of the forward-backward algorithm, commonly used to train hidden Markov models, is introduced. This algorithm is based on tree grammars, and is more efficient than the inside-outside algorithm, which was previously proposed to train SCFGs. This method is tested on the family of transfer RNA (tRNA) sequences. The results show that the model is able to reliably discriminate tRNA sequences from other RNA sequences of similar length, that it can reliably determine the secondary structure of new tRNA sequences, and that it can produce accurate multiple alignments of large collections of tRNA sequences. The model is also extended to handle introns present in tRNA genes.

View full abstract

Download PDF (1032K)
Representing Inter-residue Dependencies in Protein Sequences with Probabilistic Networks

Hiroshi Mamitsuka

1993Volume 4 Pages 46-55
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.46

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a new method for representing a local region of a protein sequence as a proba-bilistic network. The method produces, from a large number of examples of a local region, a network which describes dependency relationships that exist among amino acid residues in the region. The network is constructed using the greedy-search algorithm based on the minimum description length (MDL) principle. In our experiments, we construct two probabilistic networks of two α-helix regions in globin family protein. Experimental results show that our method provides a visual aid to understanding inter-residue dependencies of those regions with probabilistic networks, and the networks capture several important features which are peculiar to those regions.

View full abstract

Download PDF (871K)
Protein Motif Extraction Using Hidden Markov Model

Yukiko Fujiwara, Akihiko Konagaya

1993Volume 4 Pages 56-64
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.56

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we study the application of HMM to the problem of representing protein sequences by a stochastic motif. A stochastic (protein) motif represents the portions of protein sequences that have a certain function or structure, where conditional probabilities are used to deal with the stochastic nature of the motif. We proposed the iterative duplication method for HMM network learning. HMMs are much more expressive than symbolic patterns and are better suited to represent the variety of protein sequences. As an experiment, we constructed HMMs for leucine zipper motif using 112 protein sequences as a training set, and obtained an accuracy of 79.3 percent in the prediction of protein sequences, compared for an accuracy 14.8 percent when using a symbolic representation. Our approach can be used also for the validation of protein databases; the automatically constructed HMM has indicated that one protein sequence annotated as “leucine-zipper like sequence” in the database is quite different from other leucine-zipper sequences in terms of likelihood.

View full abstract

Download PDF (713K)
A Deductive Method for Construction and Visualization of Contigs in the STS Strategy

Masami Hagiya

1993Volume 4 Pages 65-73
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.65

JOURNAL FREE ACCESS

Show abstractHide abstract

The problem of constructing contigs by the STS strategy is a simple combinatorial problem if the given hit information is correct and complete. However, hit information is often incorrect or incomplete due to failure or inability of experiments. Moreover, in addition to hit information, various sources of information are also available, such as known landmarks, other clone libraries, etc. In order to cope with incompleteness, incorrectness and additional information, we developed a deductive method for constructing contigs. Contigs are constructed by deducing an equivalence relation of clone directions and a partial order among STS markers on each equivalence class of directions. In the paper, a practical algorithm based on the method is presented and its completeness is proved. The method is also axiomatized by a set of inference rules for deducing the equivalence relation and the partial orders. We finally discuss the problem of visualizing contigs based on the information deduced by our method.

View full abstract

Download PDF (814K)
Running Learning Systems in Parallel for Machine Discovery from Sequences

Ayumi Shinohara, Satoru Miyano, Setsuo Arikawa, Shinichi Shimozono, To ...

1993Volume 4 Pages 74-83
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.74

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a machine learning system BONSAI which gets positive and negative examples as inputs and produces a pair of a decision tree over regular patterns and an alphabet indexing as a hypothesis. This paper proposes two applications of BONSAI when we can run multiple BONSAI systems in parallel.
The one is to classify given examples which are coming from several different unknown classes. The process of solving the problem consists of multiply spawned BONSAI systems, each of which tries to find a decision tree, an alphabet indexing and a group of examples. It will finally partition a hodgepodge of sequences into a small number of disjoint classes together with hypotheses explaining these classes accurately.
The other is to find a good sample of a concept. Though the main interest of applying the BONSAI system is to discover good hypotheses, it is equally interesting to find a small set of examples from which a good hypothesis is made. We present a method for solving this problem by combining a strategy in genetic algorithms with multiply running BONSAI systems.

View full abstract

Download PDF (871K)
Parallel Iterative Aligner with Genetic Algorithm

MASATO ISHIKAWA, TOMOYUKI TOYA, YASUSHI TOTOKI, AKIHIKO KONAGAYA

1993Volume 4 Pages 84-93
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.84

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a new methodology to improve the performance of multiple sequence alignment by combining a genetic algorithm and an iterative alignment algorithm. Iterative alignment algorithms usually achieve better alignment than other alignment algorithms, such as tournament based multiple alignment. They, also, can incorporate parallelism to improve execution performance. However, they sometimes suffer from being trapped in the local optima and result in relatively low-quality alignments due to their rapid convergence. A genetic algorithm can save this problem by exchanging partial alignment sequences between “individuals”. Our experiments show that the combination of a genetic algorithm and an iterative alignment algorithm produces better results than iterative aligners which employ hill-climbing search strategies.

View full abstract

Download PDF (888K)
Application of Parallelized DP and A Algorithm to Multiple Sequence Alignment

Shiho ARAKI, Masahiro GOSHIMA, Shin-ichiro MORI, Hiroshi NAKASHIMA, Sh ...

1993Volume 4 Pages 94-102
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.94

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper makes two proposals to speed up the Parallel Iterative Method, which is based on the iterative strategy of the Berger-Munson algorithm.
The first proposal is to exploit finer-grained parallelism in the DP (Dynamic Programming) procedure itself. This proposal makes the processing speed proportional to the number of processors.
The second proposal is to apply the A* algorithm, a well known heuristic search algorithm, instead of DP. A* reduces the search space using heuristics, while DP traverses the whole space blindly.
We have implemented these two proposals on a parallel computer, the AP1000. In a test of parallelizing DP, ten 1000-character sequences are aligned by using 10 processors per one DP procedure at a speed 8.11 times faster than sequential processing. By applying the A* algorithm to 30 sets of test problems, we obtain optimal alignment by reducing the search space by 95%.

View full abstract

Download PDF (863K)
Parallel Multiple Alignments and Their Implementation on CM5

Naoto Ukiyama, Hiroshi Imai

1993Volume 4 Pages 103-108
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.103

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper addresses several issues in parallel multiple alignments, and reports some preliminary computational results of their implementation on CM5. Use of parallelism in the diagonal direction is laid stress on, which is quite useful especially when aligning similar strings. Some connection with the parallel approximate string matching algorithm by Landau and Vishkin [1] is also touched upon.

View full abstract

Download PDF (494K)
Extraction of Conserved or Variable Regions from a Multiple Sequence Alignment

Osamu Gotoh

1993Volume 4 Pages 109-113
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.109

JOURNAL FREE ACCESS

Show abstractHide abstract

Given a multiple sequence alignment of a family of protein or nucleotide sequences, conserved or highly variable regions are valuable landmarks to get insight into the functional and structural roles of individual regions. Conserved regions can also act as anchor points in the process of further improvement of the given alignment. Two different approaches were undertaken to extract conserved regions based on the principle of either consistency or high scores. The latter approach is easily modified to extract highly variable regions by reversing the scoring scheme. Examinations on a few protein families are discussed.

View full abstract

Download PDF (483K)
Automatic extraction of motif candidates by pairwise sequence alignment

Y. Seto, Y. Ikeuchi, M. Isoyama

1993Volume 4 Pages 114-119
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.114

JOURNAL FREE ACCESS

Show abstractHide abstract

Motifs are essential sites and therefor usually conserved in proteins. Motifs play a crucial role not only in protein world but also in genome projects. Their information are usually obtained by experiments and laborious multiple sequence alignment. Based on the fact that motifs are conserved short sequences, we developed method for extracting motifs automatically from pairwise sequence alignment. Moderately similar proteins for a probe protein are searched against all entries in sequence database. Motifs of a probe are then extracted from each pairwise alignment under the specified restrictions. We applied the method to 389 probe proteins from 89 superfamilies in PIR database and evaluated the extracted motifs.

View full abstract

Download PDF (552K)
Finding K-best Alignments of Multiple Sequences

Gen Shibayama, Hiroshi Imai

1993Volume 4 Pages 120-129
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.120

JOURNAL FREE ACCESS

Show abstractHide abstract

Detecting similarities of multiple genome sequences is one of the most important topics in genome informatics. For the purpose of finding such similarities, an alignment with the highest score with respect to some similarity criterion is provided as an output. However, the alignment with the best score is not necessarily the most significant alignment of the sequences from the viewpoint of biology. In this respect, providing suboptimal alignments is very useful.
Since finding an alignment of sequences corresponds to finding a path in some directed acyclic graph, we propose a simple algorithm to enumerate all K-best alignments in order, where K may not necessarily be specified beforehand, by finding the K longest paths in the graph. We further consider finding the subgraph formed by such K longest paths. Several useful approaches to find the optimal paths in a graph are also mentioned.

View full abstract

Download PDF (860K)
Are the Hidden Markov Models Promising in Protein Research?

Kiyoshi Asai, Hidetoshi Tanaka, Katunobu Itou, Kentaro Onizuka

1993Volume 4 Pages 130-139
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.130

JOURNAL FREE ACCESS

Show abstractHide abstract

Hidden Markov Model (HMM) , a type of stochastic model (signal source), is now becoming popular in molecular biology. HMMs consist of ‘hidden’ states, statetransition probabilities and output distributions. Because there are known algorithms to train the HMMs as stochastic representations of the training data, they are widely used for pattern recognition, especially for speech recognition.
In the field of protein research, HMMs have been used to represent stochastic motifs of protein sequences, to model the structural patterns of protein, to predict the secondary structures and upper level structures, to make multiple sequence alignments, and to classify the protein sequences.
In each case, HMM techniques are closely related to the conventional methods. An important merit for using HMMs is their flexibility as a model of protein sequences. The serious problem of HMMs is that they need a large number of training data. In this paper, we give a brief introduction to HMMs, review HMM-related protein research, compare these research with the other methods and discuss the usefulness and further possibilities of HMMs.

View full abstract

Download PDF (953K)
Protein 3D Structure Prediction Based on Multi-Level Description

KENTARO ONIZUKA, KIYOSHI ASAI, MASATO ISHIKAWA

1993Volume 4 Pages 140-151
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.140

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel scheme for protein 3D structure prediction using the Multi-level Description scheme (MLD). In this prediction scheme, a local conformation is not only determined by the primary structure at that region (i. e., primary constraints) but is also constrained by the neighboring or surrounding local conformations (i. e., geometric constraints).
The MLD describes a protein conformation with multiple levels of different scales and degrees of abstraction. This scheme facilitate to model the geometric constraints between the neighboring local conformations by analyzing the frequency of overlapping patterns of the local conformations. The primary constraints are modeled by analyzing the relationship between the primary structure and the local conformation at that region.
The MLD representing a real protein conformation must satisfy most of the constraints above. Thus. a vrotein conformation can be predicted by searching for the optimal MLD that bset satisfies the constraints. This problems is formulated as a combinatorial optimization problem.

View full abstract

Download PDF (1048K)
Prediction of Structure of Globular Proteins

II. Tertiary Structure

Nobuhiko Saitô, Motonori Ota

1993Volume 4 Pages 152-156
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.152

JOURNAL FREE ACCESS

Show abstractHide abstract

The packing mechanism of the secondary structures has been revealed. The driving force is the hydrophobic interaction between hydrophobic residues which are located at nearest distance along the chain. They are chosen because they can be bound most quickly. In this way local structures of the protein are determined and thus glow into the whole structure. This process is usually done manually, but is now tried to be carried out automatically. This formulation is applied to crambin.

View full abstract

Download PDF (351K)
A Theoretical Method for the Determination of Helix Configuration in Membrane Proteins

Makiko SUWA, Takatsugu HIROKAWA, Shigeki MITAKU

1993Volume 4 Pages 157-166
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.157

JOURNAL FREE ACCESS

Show abstractHide abstract

A theoretical method for structure prediction of membrane proteins was developed based upon physicochemical calculations, comprised of three steps. In the first step, the polar interaction field of a transmembrane helix was characterized by a probe helix method in which interaction energy between a transmembrane helix and a probe helix was calculated. A jigsaw puzzle problem in the second step was solved by using a binding maps of pairs of helices. Binding energy obtained from the polar interaction field was plotted in a binding map as functions of the orientation angles of the two helices. Finally, helix configuration determined by the analysis of binding maps was refined, minimizing the binding function of the whole system.
In order to deal with a jigsaw puzzle problem, several principles of the folding of membrane proteins have been assumed:(1) The molecular structure is formed according to some folding pathway.(2) The dominant interaction in hydrophobic region of membrane is the polar interaction.(3) Transmembrane helix can be regarded as a stable rod with charge distribution on it. The comparison of the predicted structure of bacteriorhodopsin with the experimental one revealed that the reconstruction of the relative position and the orientation of transmembrane helices is possible by this method. Applying this method to rhodopsin, the configuration of transmembrane helices was determined, which was quite similar to the experimental configuration of transmembrane helices. The mechanism of the structural change of rhodopsin by cis-trans isomerization of retinal was suggested from the predicted structure.

View full abstract

Download PDF (5070K)
A Method for Extracting Spatially Close Peptide Segments in Proteins

Zenmei OHKUBO, Minoru KANEHISA

1993Volume 4 Pages 167-174
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.167

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to predict protein structures from their primary sequences, the understanding of long-range interactions is one of the most critical points. We are dealing with this problem by focusing on the pairs of peptide segments which are separated in the primary sequence but are close in the three-dimensional structure. The method is applied to a set of structure-resolved proteins to see if there are any significant features for association of local structures, such as secondary structure segments. The dataset consists of 88 nonhomologous proteins selected from the Brookhaven Protein Data Bank (PDB) using the superfamily classification of the Protein Information Resource (PIR). In the method, given the definition of the distance between two segments, spatially close segment-pairs are extracted for Ca segments of 4 or 7 residues long. The result shows that there are no preferred distances for association of two helical segments but there is a minimum of twenty intervening residues required for parallel helical segments.

View full abstract

Download PDF (552K)
Quantification Analysis of Translation Initiation Signal Sequences in Vertebrate mRNAs

Yôichi IIDA, Takeshi MASUDA

1993Volume 4 Pages 175-182
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.175

JOURNAL FREE ACCESS

Show abstractHide abstract

Concerning the translation initiation signals in vertebrate mRNAs, not only ATG initiation codon but also sequences flanking the initiation codon are required to direct the position of initiation. A consensus sequence for the signal, (GCC) GCCGCCATGG, has been proposed by Kozak, but actual initiation sequences differ from it in a greater or lesser degree. In the present report, the translation initiation signal sequences of human β-globin and β-thalassemia mRNAs were analyzed using a quantification method proposed previously. In this method, each 16-nucleotide sequence in the mRNA was charactarized by its sample score, which shows intensity of the signal. Scoring of signal sequences could explain not only the authentic initiation site but also the experimental results of various mutations which took place around the initiation site. Further analysis demonstrated that, in addition to the signal intensity, the sequence nearest the cap site was preferred. This supported Kozak's scanning hypothesis, in which the eukaryotic small ribosomal subunit binds initially at the 5'-end of mRNA and subsequently migrates to the signal sequence.

View full abstract

Download PDF (785K)
Multiple Sequence Alignment using Parallel Genetic Algorithms

Koji Tajima

1993Volume 4 Pages 183-187
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.183

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a more sensitive algorithm for multiple sequence alignment using parallel genetic algorithms. With less computation than that needed for multi-dimensional dynamic programming approaches, we can obtain multiple alignments which have better similarity than that obtained by repeating two-dimensional dynamic programming. The parallel processing of genetic algorithms was performed on a Fujitsu parallel computer AP1000.

View full abstract

Download PDF (352K)
Prediction of Protein Conformations by a Spin Glass Model (I)

Tsuyoshi Yoshizawa, Masaki Fumoto, Tamio Yasukawa

1993Volume 4 Pages 188-196
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.188

JOURNAL FREE ACCESS

Show abstractHide abstract

A spin glass model for polypeptide chains consisting of 4 states a, b, c₁ and c₂, was introduced for the energy minimal conformation search by an extended Hopfield algorithm, in which energy dissipation rate was gradually reduced to simulate annealing processes. Inter-residue interaction energies were estimated by molecular mechanics program AMBER using model oligopeptide chains and crystal structure data. Preliminary results obtained with BPTI are not so satisfactory and several measures to improve the prediction accuracy were discussed.

View full abstract

Download PDF (616K)
Toward prediction of multi-states secondary structures of protein by neural network

Fumiyoshi Sasagawa, Koji Tajima

1993Volume 4 Pages 197-204
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.197

JOURNAL FREE ACCESS

Show abstractHide abstract

Usually, the prediction of protein secondary structure by a neural network is based on three states (α-helix, β-sheet and coil). However, a recent report of protein of which structure is determined presents more detailed secondary structure as 3₁₀-helix. It is expected that more detailed secondary structure of protein should be predicted. In application of neural network to the prediction of multi-states secondary structures, some problematic points are discussed. The prediction of globular protein secondary structures is studied by a neural network. The application of a neural network with a modular architecture to prediction of protein secondary structures (α-helix, β-sheet and coil) is presented. Each module is a three layer neural network. The results from the neural network with a modular architecture and with a simple three layer structure are compared. Overlearning effect is investigated in ordinary and modular neural networks. The prediction accuracy by a neural network with a modular architecture is higher than of the ordinary neural network. The 3, 4 and 8 state classification scheme of secondary structures are considered in the ordinary three layer neural network. The percentage of correct prediction depends on these state classification method. Furthermore, for 3 and 4 state classification scheme of protein secondary structures, the consistencey of outputs of modules on the neural network with modular architecture is investigated.

View full abstract

Download PDF (791K)
Sequence Analysis of Zinc Finger DNA-Binding Protein

Kotoko Nakata

1993Volume 4 Pages 205-210
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.205

JOURNAL FREE ACCESS

Show abstractHide abstract

Using the neural network algorithm with back-propagation traing procedure, we analysed the zinc finger DNA binding protein sequences. The patterns which were used in the neural network are amino acids sequence pattern, the electric charge and polarity, amino acids group properties, amino acids ancestral group, hydrophobicity, hydrophilicity and the secondary structure. For the comparison, th e discriminant analysis was also tried. As for the TFIIIA type (C_ys-X_2-4-C_ys-X_12-15-H_is-X_3-5-H_is)(X is any amino acid) zinc finger DNA binding motifs, the prediction results reached high discrimination in the neural network algorithm and the discriminant analysis. Although each result of single perceptron algorithm is not always good in the case of the estrogen type (C_ys-X_2-4-C_ys-X_12-15-C_ys-X_2-4-C_ys) zinc finge, the combination of the attributes reached high discrimination.

View full abstract

Download PDF (357K)
Protein Sequence Motif Extraction with a Probabilistic Logic Neural Network

Motif Evaluation on a 3-D Structure

Kazuhiro Iida, Hiroshi Mamitsuka

1993Volume 4 Pages 211-218
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.211

JOURNAL FREE ACCESS

Show abstractHide abstract

A probabilistic logic neural network, mSDN reveals multiple biochemical rules hidden in a protein amino-acid sequence. Two motifs are extracted from a 16-residue hemoglobin α-helix region. The motifs each containing only 3 amino-acid residues, correctly classify new data with 96% accuracy. Evaluating the motifs on a hemoglobin 3-D structure suggests that one motif represents a local α-helix determiner, and the other explains long-range interactions which are important for hemoglobin tertiary structure. The findings indicate that the mSDN extracts region specific and biochemically significant motifs from an amino-acid sequence, and suggest that the network separates heterogeneous biochemical rules in a sequence into corresponding motifs. Motifs extracted by the mSDN will help us to analyze, and to predict protein conformations and its functions.

View full abstract

Download PDF (679K)
Learning Algorithms of Three Layered Neural Networks for Sequence Classification

Koichi NIIJIMA, Shinichi SHIMOZONO

1993Volume 4 Pages 219-223
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.219

JOURNAL FREE ACCESS

Show abstractHide abstract

Domains of classifying positive and negative patterns are derived by imposing some heteroassociative output conditions on the network. Using the shape of the domain, a functional to be minimized is introduced to determine connection weights and threshold values of the network. Minimization techniques of the functional, which give learning algorithms of the network, are also discussed. In the last, remarks on numerical experiments are described.

View full abstract

Download PDF (364K)
Classification of Proteins via Successive State Splitting Algorithm of Hidden Markov Network

Hidetoshi Tanaka, Kentaro Onizuka, Kiyoshi Asai

1993Volume 4 Pages 224-230
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.224

JOURNAL FREE ACCESS

Show abstractHide abstract

Hidden Markov Model (HMM) introduces a stochastic approach to protein representation and motif abstraction. We need the stochastic classification which is seamless with HMM representation and abstraction. Successive State Splitting (SSS) classifies proteins represented by HMM. It uses no previous knowledge of the proteins. The SSS algorithm was originally developed for allophone modeling. It is based on continuous distribution of phenome data. It enables to obtain an appropriate Hidden Markov Network automatically, and HMM simultaneously. We map amino acids onto continuous space according to quantification based on PAM-250.

View full abstract

Download PDF (606K)
Assessment of Species-specific Codon Usage by Principal Component Analysis

Shigehiko Kanaya, Yoshihiro Kudo

1993Volume 4 Pages 231-238
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.231

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to examine differences of preferential usage of synonymous codons among species systematically, principal component analysis is applied to a matrix consisting of relative frequencies in synonymous codons. The first two principal components (PC1 and PC2) account for 66% and 8%, respectively. From the PC projection by the first two components, the following conclusion can be obtained:(1) The base-preference of A and U (G and C) at the third position in synonymous codon contributes negatively (positively) to the PC1: Vertebrates and chloroplasts are clusterized in narrow regions with positive and the most negative PC1, respectively.(2) The PC2 is important to distinguish between prokaryotes and (eukaryotes: Eukaryotes prokaryotes) prefer di-nucleotides GA, AG, CU and CA (CG, GC, and AA) at the second and the third positions in codons.

View full abstract

Download PDF (459K)
A search for CpG islands associated with genes in human genomic sequences compiled in the DNA database

Jun Kusuda, Makoto Hirata, Atushi Toyoda, Ichiro Takahashi, Katsuyuki ...

1993Volume 4 Pages 239-244
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.239

JOURNAL FREE ACCESS

Show abstractHide abstract

To estimate the frequency in the association of CpG islands with genes distributed in human genome, we have screened the statistically expected CpG islands for sequenced human DNAs compiled in DNA database. The survey of 2605 genomic sequences (>300 bp) coding 833 genes mapped on human chromosomes identified 1030 CpG island-linked sequences classified to 324 genes, indicating that at least 39% of human genes are coincided with CpG islands. Furthermore, it is found that 19%, 36% and 45% of CpG islands mapped on single chromosomal bands are located on G-, R- and T-bands. This result suggests that the occurrence of CpG island-genes increases with increasing the global G+C% level of chromosomal bands.

View full abstract

Download PDF (581K)
Extraction of the ligand-related motifs in enzymes

Mikita Suyama, Takaaki Nishioka, Jun'ichi Oda

1993Volume 4 Pages 245-254
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.245

JOURNAL FREE ACCESS

Show abstractHide abstract

To extract the ligand-related motifs from the sequences of enzymes, we have constructed Ligand Chemical Database for Enzyme Reaction that links a chemical compound to amino acid sequences. Among 1, 966 ligands registered, 519 chemical compounds were related to 1, 488 ligand-linked sequences. Sequence fragments of 10-residue long, commonly found among the ligand-linked sequences for each chemical compound, were defined as ligand-related motifs. Motifs extracted for pyridoxal phosphate were tested against the crystal structures of aspartate aminotransferase complexed with pyridoxal phosphate. Twenty-four motifs among 93 motifs extracted from the enzyme include the residues that make chemical interactions with the bound pyridoxal phosphate. One of the motifs, K-x-x-G-L-x-x-x-R-V, actually participates in the recognition of pyridoxal phosphate in another enzyme, 1-aminocyclopropanel--carboxylate synthase. The present approach provides the ligand-related motifs and shows great potentials to characterize the unknown genes sequenced by the genome project.

View full abstract

Download PDF (944K)
Automatic Procedure to Extract Signature Pentapeptides from the Protein Sequence Database

Ikuo Uchiyama, Atsushi Ogiwara, Zenmei Ohkubo, Minoru Kanehisa

1993Volume 4 Pages 255-263
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.255

JOURNAL FREE ACCESS

Show abstractHide abstract

A method is described for extracting signature pentapeptides that are conserved and exclusively found in a group of homologous proteins. The BLAST algorithm is used to count the frequency of occurrences of pentapeptide patterns allowing limited substitutions, as well as to perform homology search. For those pentapeptides that appear in a given sequence we examine the frequency of occurrences of these pentapeptides and related ones in homologous sequences which are ordered according to the homology score. By comparing against the frequency in the entire database, we can extract uniquely conserved pentapeptides and at the same time perform a grouping of homologous sequences. Thus, our procedure can automatically identify, if any, pentapeptides that are strongly tied with the group. Possibility of using our pentapeptide word dictionary to infer protein function is discussed.

View full abstract

Download PDF (669K)
A Simple Method for Finding Local Sequence Similarities

Keiichi Nagai, Tetsuo Nishikawa, Hideki Kambara, Toshihisa Takagi

1993Volume 4 Pages 264-269
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.264

JOURNAL FREE ACCESS

Show abstractHide abstract

Conventional database search programs for finding local similarities in protein and DNA sequences, such as the one based on the Smith-Waterman algorithm, FASTA, and BLAST, can contain subregions having high similarity, low similarity, and even no similarity. We propose a simple method for finding significant local sequence similarity regions, where the alignment results of two sequences are graphed as integrated scores calculated along the aligned sequences using the match, mismatch, and gap penalty scores. This method has been used to find local similarity subregions in alignment results obtained by BLAST or the Smith-Waterman algorithm. Potential applications for finding domain structures and the characteristic sequence patterns are also shown.

View full abstract

Download PDF (492K)
A Computer Modeling Method for the Three-dimensional Structure of RNA

Hiroyuki Ogata, Yutaka Akiyama, Minoru Kanehisa

1993Volume 4 Pages 270-274
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.270

JOURNAL FREE ACCESS

Show abstractHide abstract

We are developing a computational method for automatically organizing collections of structural knowledge of RNA into a three-dimensional (3-D) form. The goal of our method for modeling of RNAestructure is to find, ase much as possible, conformations of RNA which satisfy the constraints frome experiments and sequence analysis and, at the same time, whose local conformations are close to some representative conformations. For efficient conformational search, we used a genetic algorithm as a trial. We applyed our method in modeling a single stranded region of an RNA for the estimation of efficiency of our method.

View full abstract

Download PDF (471K)
Construction of a Functional Word Dictionary for Primate Promoter Sequences

Wataru Fujibuchi, Minoru Kanehisa

1993Volume 4 Pages 275-282
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.275

JOURNAL FREE ACCESS

Show abstractHide abstract

We constructed a dictionary of sequence motifs for transcription regulation with a heuristic method from a set of DNA sequences upstream of the transcription initiation site. The method first identifies wealdy conserved blocks within a given region relative to the initiation site by the search and merge of six-base patterns. Then most conserved portions of these blocks are extracted by calculating the information content after similar blocks are multiply aligned. The procedure was applied to primate promoters and the result was evaluated with the Transcription Factor Database (TFD). The result will give us new biological insights into the DNA signals.

View full abstract

Download PDF (890K)
A Similarity Measure for DNA Sequence Analysis Based on Locality

Takashi Yokomori, Satoshi Kobayashi

1993Volume 4 Pages 283-292
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.283

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a simple string similarity measure and apply it to the problem of DNA sequence analysis, more specifically, to the problem of analysing molecular evolution. This measure is based on a “local feature” that was motivated from a theoretical characterization on DNA splicing sequences.
We demonstrate the usefulness of the proposed measure by presenting an experimental result which concerns evolutionary molecular analysis. This sheds new light on the other types of DNA sequence analysis such as protein classification, motif identification.

View full abstract

Download PDF (767K)
Prediction of Structures of Globular Proteins I. Secondary Structure

Yukio Kobayashi, Nobuhiko Saitô

1993Volume 4 Pages 293-299
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.293

JOURNAL FREE ACCESS

Show abstractHide abstract

Statistical mechanical method is proposed to predict the secondary structures of globular proteins. Three-state prediction which provides simultaneously the probabilities of α-helix, β-strand and coil is performed with a recurrence method. The probabilities of the ith residue in a-helix or in β-strand are calculated with statistical weights for amino acid pairs in a-helix or in β-strand. We determine the statistical weights to yield the correct predictions for the proteins with known structures instead of calculating directly the interaction energies between residues. To do this, we introduce an objective function and estimate the weights so as to minimize this function by referring to the proteins for optimization. This method yields prediction accuracy of 67% for 13 proteins for accuracy estimation. This value does not exceed the best values obtained by the method based on homology. However, we have a hope to improve the accuracy, since we can analyze the reasons for poor accuracy in contrast to other methods.

View full abstract

Download PDF (428K)
Variation of the Order of Importance in the Base Position for 5'-Splice Site Sequences

Khawaja Sirajuddin, Tomomasa Nagashima, Koichi Ono

1993Volume 4 Pages 300-305
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.300

JOURNAL FREE ACCESS

Show abstractHide abstract

The consensus sequence for 5'-splice site has been proposed as CAG/GTGAGT. But the actual splice site sequence differs from it at a certain extent more or less. In this paper we analyze various mammalian globin genes using the induction of decision tree. We have found that the prediction rate for discriminating unknown sequences increases in accordance with the increase of the rate of false splice site sequences with dinucleotide GT at 4th and 5th position in the learning data set.

View full abstract

Download PDF (424K)
Analysis of Abnormal Splicing by Neural Networ

Hiroshi FURUTANI

1993Volume 4 Pages 306-314
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.306

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed networks with back-propagation learning algorithm for the prediction of splice sites in mRNA precursors. We used these networks to predict the effects of mutations on splicing of protein coding genes. We applied neural networks to β-thalassemia genes (mutant β-globin genes), a hemophilia B gene (mutant blood coagulation factor IX gene) and a mutant c-Ha-ras oncogene. We demonstrate that these networks predict abnormal splicing patterns in these genes consistent with experiments.

View full abstract

Download PDF (448K)
Backbone Conformational Pattern Clustering of Three-Dimensional Peptide Fragments of Proteins

Motokazu KAMIMURA, Yoshimasa TAKAHASHI

1993Volume 4 Pages 315-324
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.315

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we aim to examine in detail the data distribution within each conformational pattem class and to identify some local common structural features among the fragments in a particular cluster (or subcluster). Backbone conformational pattern clustering was carried out for the three-dimensional peptide fragments where the Φ-ψ, conformational pattern of the TA (target amino acid) belongs to class A (α-helix dominant class) or β(n-sheet dominant class) as defined in our previous work. The analysis for the fragments of class A suggested that these fragments involve four representative local backbone conformational patterns, not only for typical α-helix fragments but also for fragments closely related to type I turn or the starting moieties of α-helices. On the other hand, the analysis for class B fragments showed that these have much more diversity than class A fragments with respect to their local backbone structures. The details of the methods and results of the analyses are discussed here.

View full abstract

Download PDF (859K)
Poly-tRNA Theory on the Origin and Evolution of mRNA and Genetic Codes: Evolution from Tandem tRNA-Repeats to Primitive mRNAs Encoding F₀-ATPase a Subunit and Glycyl-tRNA Synthetase

Koji Ohnishi

1993Volume 4 Pages 325-331
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.325

JOURNAL FREE ACCESS

Show abstractHide abstract

The Bacillus subtlis trrnD operon has a structure of 5'[16S rRNA-23S rRNA-5S rRNA-(RNA) ₁₆] 3'. The tRNA duster in this operon includes 16 tandemly repeated tRNA genes (denoted by “poly-tRNA structure”), in which ordering of amino acid (aa) specificities of these tRNA is “NSEVMD FT YWHQ GCLL”. An ancient “trrnD -peptide” possessing this aa sequence was hypothesized, and protein sequence regions similar to tanD-peptide were searched for from PIR Proein Sequence Database. The aa's 139-156 in the E. coli Gly-tRNA synthetase (GIyRS) a subunit was found to be most similar to this peptide.
Further analysis revealed that not only the GIyRS gene encoding GIyRS α, but also the a gene of Synechococcus 6301 encoding F₀-ATPase a subunit, are both true homologues of the BSU trrnD poly-tRNA region. These findings strongly support the recently proposed “poly-tRNA theory”(Ohnishi, 1993) on the origin of mRNA and genetic codes. Thus it has now been concluded that the trrnD polytRNA region is a relic of aost primitive RNA molecule capable of synthesizing a trrnD-peptide-like primitive peptide in early life. The most paradoxical problem on the origin of genetic codes seems to have been basically solved from the aspect of poly-tRNA theory.

View full abstract

Download PDF (761K)
Metrics for Protein Sequence Comparison

Tsukasa Sakai

1993Volume 4 Pages 332-338
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.332

JOURNAL FREE ACCESS

Show abstractHide abstract

Substitution odds r (i, j), for amino acid residues, can be transformed to similarities s (i, j) by normalizing with geometric average of conservative odds r (i, i) and r (j, j). Similarities thus derived for all twenty natural amino acid residues in proteins, conform to the range 0 to 1, and have complementary dissimilarities. Empirical test has qualified that the dissimilarity satisfies all metric requirements as distance between residues. Relative certainty, as identity index, calculated from both similarity and dissimilarity, can be used as matching scores, consistent with both of them, in protein sequence comparison.

View full abstract

Download PDF (554K)
Using Analogical Reasoning to Predict a Protein Structure

Takashi Ishikawa, Takao Terano

1993Volume 4 Pages 339-346
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.339

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a computational method to predict a protein structure by analogical reasoning from known protein structures. The proposed method: Analogy by Abstraction uses heuristics to reduce the search complexity to get appropriate transformations to create a structure of the unknown protein form a known protein structure. We implement an algorithm of the method in Prolog programing language, and exemplify its effectiveness by re-predicting the structure of ‘Zinc fingers’ from its amino-acid sequence.

View full abstract

Download PDF (576K)
A sensitive and efficient homology search method to find proteincoding regions using “protein-coding region DNA database”

K. Wada, Y. Wada, S. Tanaka, H. Doi, Y. Nakamura, K. Sugaya, T. Fukaga ...

1993Volume 4 Pages 347-351
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.347

JOURNAL FREE ACCESS

Download PDF (370K)
Integration of Genome Databases Using a Deductive Object-Oriented Database

Susumu Goto, Toshihisa Takagi, Norihiro Sakamoto

1993Volume 4 Pages 352-361
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.352

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently, many genome databanks were developed as a result of growing genome project activities. Each of them consists of a large amount and variety of data, and they were developed independently. Therefore, their integration and efficient management of the data are required. It is also necessary to develop a framework for easily building and testing biological hypotheses with the integrated database. We developed a deductive objectoriented database for searching an integrated database, acquiring new knowledge from it, and storing the knowledge in the database. It consists of an object-oriented database that integrates the conventional genome databases such as GenBank, and deductive language interface for genome analysis. In this paper, we present an overview of the system and examples of analyses using the database.

View full abstract

Download PDF (881K)
Enhancement of the Integrated Database “HyperGenome” for Genome Maps and Sequence Information

Takahiko SUZUKI, Susumu NAKASHIMA, Toshihisa TAKAGI, Satoru KUHARA, Mi ...

1993Volume 4 Pages 362-369
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.362

JOURNAL FREE ACCESS

Show abstractHide abstract

An integrated database system “HyperGenome” for genome maps and DNA sequences was developed. The system can handle two different types of data, each of which has an unique complex structure. Graphical user interface (GUI) enables ready retrieval of information obtained from genome mapping data and data on DNA sequences. Data on mapping are derived from the Genome Data Base (GDB) and sequence data are from GenBank.
The following information was added to the system. 1. Mendelian Inheritance in Man (MIM) entries can be linked to a locus in our system. 2. Amino asid sequences from Protein Identification Resources (PIR) can be displayed, in conjunction with the nucleotide sequence.

View full abstract

Download PDF (5777K)
Locus-in: a new database system with graphical front end to integrate mapping data

Shinsei Minoshima, Nobuyoshi Shimizu

1993Volume 4 Pages 370-375
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.370

JOURNAL FREE ACCESS

Show abstractHide abstract

We developed a new database system, Locus-in, to enter raw mapping data and construct integrated maps. This system works on Sun workstation with X-window and a graphic library, Motif. The system supports full graphical user interface. It has the following unique functions:(1) to zoom-in on a specific region of interest;(2) to generate a number of sub-windows associated with a specific region for entry and display of data (each subwindow accepts either ordered or not ordered and either raw or published data); and (3) to create new breakpoints. The current version of Locus-in will be demonstrated at the workshop.

View full abstract

Download PDF (375K)
ContigMaker: Software Tool for Contig Map Construction

Akira Suyama, Masami Hagiya, Takashi Ito, Asao Fujiyama, Akira Ohyama, ...

1993Volume 4 Pages 376-384
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.376

JOURNAL FREE ACCESS

Show abstractHide abstract

ContigMaker is a software tool to aid contig map construction. It is a Motif application running on UNIX workstations with the X Window System. ContigMaker is composed of five major components: map data manager, map analyzer, map viewer, map aid, and project manager. Contig-mapping data obtained by experiments are stored in a database of the map data manager. The stored data are then subjected to analysis by the map analyzer to generate contigs. ContigMaker supports the two strategies for contig construction: the STS (sequence-tagged sites) strategy and the MOF (mapping by oligonucleotide fingerprinting) strategy. The generated contigs are assembled into a contig map according to positions of landmarks falling on the contigs. ContigMaker allows a user to extract landmark information from a public genome database such as the GDB. The contig maps constructed are graphically drawn by the map viewer. The map aid provides miscellaneous small useful tools to finish a contig-mapping task. A repeated task ContigMaker performs can be automated by a macro created by the project manager. The macro will save time and effort for contig map construction.

View full abstract

Download PDF (1012K)
GNOME: a sequence data management tool to access homology, motif, and other data analysis servers

Toshiyuki Niiyama, Takeo Tokimori, Atsushi Ogiwara, Ikuo Uchiyama, Ken ...

1993Volume 4 Pages 385-393
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.385

JOURNAL FREE ACCESS

Show abstractHide abstract

GNOME is a sequence data management tool through which users can efficiently access e-mail servers for various molecular biological analyses on Internet including GenomeNet. It supports BLAST/FASTA servers for homology searches, PROSITE/MotifDic servers for motif searches, and bget/bfind servers for DB entry retrievals. One of its most eminent features is that it can not only send e-mails for queries but also receive and manage e-mails for replies. In addition, its interface is very user-friendly. Therefore, it should considerably enhance efficient and profound analyses of newly-determined sequence data in both individual biological researches and large-scale genome projects

View full abstract

Download PDF (1140K)
Genomatica: an integrated data management and analysis tool for genome sequencing projects

Yutaka Akiyama, Hirotada Mori, Satoru Kuhara, Naoki Ogasawara, Nobuyuk ...

1993Volume 4 Pages 394-401
Published: 1993
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.4.394

JOURNAL FREE ACCESS

Show abstractHide abstract

Genomatica is an integrated software tool designed for helping systematic management of a large number of DNA sequence fragments obtained through a genome sequencing project.
Its graphic user-interface also allows users to look, with any magnifying factor, into any position of the specified chromosome and to browse various kinds of collected information altogether (including: DNA sequence itself, related gene descriptions, bibliographic references, corresponding GenBank entries, confirmed or putative coding regions, results from homology analysis for the expected protein, RNA genes, clone information, enzyme restriction maps, comments from administrator, private memorandums by user).
We are planning to use Genomatica in E. coli (local data compilation mainly managed by Mori),
B. subtilis (by Ogasawara), and S. cerevisiae (by Murakami) genome sequencing projects.
The Genomatica project was started on 1992 as one of the advanced genome database projects sponsored by Human Genome Center, University of Tokyo. In June 1993, ver. 2.0 which was fully re-designed with NCBI vibrant library was released. Further augmented version Genomatica 2.1 (with several sequence analysis functions and network communication modules) will be released on Nov. 1993 and will be distributed through anonymous ftp services. The Genomatica system is currently available for X11 window system on Unix workstations, but Macintosh and IBM-PC versions will be also announced soon.

View full abstract

Download PDF (1102K)

Register with J-STAGE for free!