Genome Informatics

Protein Structure Alignment Using a Graph Matching Technique

Tatsuya Akutsu

1995Volume 6 Pages 1-8
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.1

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes new algorithms for protein structure alignment. Protein structure alignment is, given two three-dimensional protein structures, to find spatially equivalent residue pairs. Each algorithm consists of the following two steps: first an initial superposition is computed; then a structure alignment is computed and refined using bipartite graph matching. The proposed algorithms are shown to be useful through an experimental comparison with a previous alignment algorithm.

View full abstract

Download PDF (738K)
k-Group Multiple Alignment Based on A* Search

H. Imai, T. Ikeda

1995Volume 6 Pages 9-18
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.9

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a k-group alignment algorithm for multiple alignment as a practical method. In iterative improvement methods for multiple alignment, the so-called group-to-group two-dimensional dynamic programming has been used, and in this respect our proposal is to extend the ordinary two-group dynamic programming to a k-group alignment programming. This extension is conceptually straightforward, and here our contribution is to demonstrate that the k-group alignment can be implemented so as to run in a reasonable time and space under standard computing environments. This is established by generalizing the A* search approach for multiple alignment devised by Ikeda and Imai [8]. The k-group alignment method can be directly incorporated in existing methods such as iterative improvement algorithms (Berger and Munson [2], Gotoh [4]) and tree-based (iterative) algorithms (Hirosawa et al. [6]). This paper performs computational experiments of applying the k-group method to iterative improvement algorithms, and shows that our approach can find better alignments in reasonable time.

View full abstract

Download PDF (1167K)
Construction of Phylogenetic Trees from Amino Acid Sequences using a Genetic Algorithm

Hideo Matsuda

1995Volume 6 Pages 19-28
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.19

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a novel algorithm to search for the maximum likelihood tree constructed from amino acid sequences. This algorithm is a variant of genetic algorithms which uses scores derived from the log-likelihood of trees computed by the maximum likelihood method. This algorithm is valuable since it may construct more likely tree from randomly generated trees by utilizing crossover and mutation operators. In a test of our algorithm on a data set of elongation factor-1 α sequences, we found that the performance of our algorithm is comparable to that of other tree-construction methods (UPGMA, the neighbor joining and the maximum parsimony methods; and the maximum likelihood method with different search algorithms).

View full abstract

Download PDF (1005K)
SIMFLY2: Simulation of a Fly Embryo

Masanori Arita

1995Volume 6 Pages 29-38
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.29

JOURNAL FREE ACCESS

Show abstractHide abstract

Biological analysis of segment formation in Drosophila embryogenesis provides a good ground for modelling interaction of DNA-binding proteins. In this paper, we propose threshold model for qualitative simulation of the interaction, and introduce SIMFLY2. This revised version of SIMFLY, a simulator for protein interaction, integrates genetic algorithm for the search of optimal relations among proteins. We confirmed that SIMFLY2 did find the relation we had found last time in SIMFLY by exhaustive search. SIMFLY2 also found interaction models between two pair-rule proteins and gap proteins.

View full abstract

Download PDF (969K)
Building a Knowledge-Base for Protein Function Prediction using Multistrategy Learning

Takashi Ishikawa, Shigeki Mitaku, Takao Terano, Takatsugu Hirokawa, Ma ...

1995Volume 6 Pages 39-48
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.39

JOURNAL FREE ACCESS

Show abstractHide abstract

Conventional techniques for protein function prediction using similarities of amino acid sequences enable us to only classify the protein functions into function groups. They usually fail to predict specific protein functions. To overcome the limitation, in this paper, we propose a method for protein function prediction using functional feature analysis and a multistrategy learning approach to building the knowledge-base. By “functional feature”, we mean a feature of an amino acid sequence characterizing the function of a protein with the amino acid sequence. They are secondary and/or tertiary structures of amino acid sequences that corresponds to functional elements comprising the functions of a protein. The functional features are extracted from amino acid sequences using Abductive inference, Inductive inference, and Deductive inference. In this paper, we show the effectiveness of the method by an example problem to classify functions of bacteriorhodopsin-like proteins.

View full abstract

Download PDF (1128K)
A Data and Knowledge Base for Cell Signaling Networks

Takako Igarashi, Tsuguchika Kaminuma, Yoko Nadaoka

1995Volume 6 Pages 49-56
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.49

JOURNAL FREE ACCESS

Show abstractHide abstract

Each cell in a multicellular animal is programmed during development to respond to specific outer chemical signals. Some of these chemical signals activate receptor proteins on the surface of the cell that trigger series of membrane and intracellular signal transductions, and eventually influence gene expressions. These complex cell signaling mechanisms have been unveiled at molecular levels in various multicellular organisms in the past decade. It was found that these molecular signaling pathways or what we may call cell signaling networks (CSN) play important roles in wide range of biological phenomena that characterize multicellular animals. These phenomena include development, differentiation, reproduction, morphogenesis, carcinogenesis, apoptosis, and even learning.
We have developed the data and knowledge base of the CSN that consists of interacting extracellular (xenobiotic) chemicals and biomolecules. The system contains signaling pathways, and structural and functional data of the molecules. The system was implemented on UNIX workstations using an object oriented database management system ACEDB.

View full abstract

Download PDF (4960K)
Automated Discovery of Protein Functional Units from Amino-acid sequences using Rough-Sets-based Comparative Analysis

S. Tsumoto, H. Tanaka, K. Tsumoto, I. Kumagai

1995Volume 6 Pages 57-66
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.57

JOURNAL FREE ACCESS

Show abstractHide abstract

Protein structure analysis from DNA sequences is an important and fast growing area in both computer science and biochemistry [S]. One of the most important problems is that two proteins, both of which have the similar three-dimensional structure, have different functions, such as lysozyme and lactalbumin. In such cases, comparative analysis of both amino acid sequences is effective to detect the functional and structural differences. In this paper, we introduce a system, called MW1 (Molecular biologists' Workbench version 1.0), which extracts differential knowledge from amino-acid sequences by using rough-set based classification, statistical analysis and change of representation. This method is applied to the following two domain: comparative analysis of lysozyme and α-lactalbumin, and analysis of immunoglobulin structure. The results show that several interesting results from amino-acid sequences, are obtained which have not been reported before.

View full abstract

Download PDF (1089K)
Grammatically Modeling and Predicting RNA Secondary Structures

Yasuo UEMURA, Aki HASEGAWA, Satoshi KOBAYASHI, Takashi YOKOMORI

1995Volume 6 Pages 67-76
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.67

JOURNAL FREE ACCESS

Show abstractHide abstract

Tree Adjunct Grammar for RNA (TAG²_RNA) is a new grammatical device to model RNA secondary structures including pseudoknots. An efficient parsing algorithm for this grammar is developed, and applied to some computational problems concerning RNA secondary structures. With this parser, we first try to predict secondary structures of RNA sequences which are known to form pseudoknots structures, and show prediction results which nicely match the known structures. Further, a (-1) frameshift grammar is constructed based on a biological observation that a (-1) frameshift might be caused from some structural features of RNA sequences. The proposed grammar is used to find candidate sequences for (-1) frameshift in Human spumaretrovirus gag and pol genes.

View full abstract

Download PDF (906K)
Hidden Markov Model to Extract Leucine Zipper Motif

Yukiko Fujiwara, Minoru Asogawa, Akihiko Konagaya

1995Volume 6 Pages 77-85
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.77

JOURNAL FREE ACCESS

Show abstractHide abstract

To represent motifs of amino acid sequences using Hidden Markov Models (HMMs) with high accuracy, the HMM topology must be specified according to the motif's characterisitcs. For this purpose, the “iterative duplication method”, which learns the optimal HMM topology, was developed. In this method, a small fully-connected HMM was gradually expanded by a state splitting and a transition deleting. However, the method did not clearly determine a splitting state, because it randomly selected one of the mostly connected states. To determine a splitting state, we improve the iterative duplication method. The improved method selects the most ambiguous state for splitting. Since this ambiguity relies on the transition probabilities and observation distributions, the splitting state can be determined. Additionally, the improved method considers negligible state deletion. In an experiment, an HMM is obtained for a leucine zipper motif using this improved method. The prediction accuracy of this HMM is 96.48 percent. It is Compared with that of the HMM obtained by the previous method and the fully-connected HMM estimation method. The accuracy of the previous method was 95.85 percent and that of the fully-connected HMM was 95.22 percent.

View full abstract

Download PDF (989K)
Assessment of Species-specific Diversity of Genes in Codon Usage

Shigehiko Kanaya, Yoshihiro Kudo, Yasukazu Nakamura, Toshimichi Ikemur ...

1995Volume 6 Pages 86-87
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.86

JOURNAL FREE ACCESS

Download PDF (199K)
Estimation of Protein-production levels in Escherichia coli Genes on the basis of Multivariate Diversity in Codon Usage

Shigehiko Kanaya, Yoshihiro Kudo, Yasukazu Nakamura, Toshimichi Ikemur ...

1995Volume 6 Pages 88-89
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.88

JOURNAL FREE ACCESS

Download PDF (181K)
Consensus Genic Sequences in Bacterial rRNA-tRNA gene clusters

Yoshihiro KUDO, Shigehiko KANAYA

1995Volume 6 Pages 90-91
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.90

JOURNAL FREE ACCESS

Download PDF (195K)
MDL Method: an Inductive Inference Method for Reconstructing Phylogenetic Trees

Fengrong Ren, Hiroshi Tanaka, Norio Fukuda

1995Volume 6 Pages 92-93
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.92

JOURNAL FREE ACCESS

Download PDF (214K)
Clustering of all known and predicted open reading frames of Escherichia coli K12

Takeshi Itoh, Minoru Yano, Miwako Kajihara, Hirotada Mori, Keiko Takem ...

1995Volume 6 Pages 94-95
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.94

JOURNAL FREE ACCESS

Show abstractHide abstract

At present, the non redundant contig sequences of E.coli which covers about 70% of the whole chromosome are constructed. We predicted ORF's (Open Reading Frames) from 2, 554, 518 bp contig sequences on the basis of Shine-Dalgarno (ribosome binding) sequence. All ORF's were classified according to the structural similarities. Through examining the homology of ORF's in each group in detail, some structural units were revealed.

View full abstract

Download PDF (226K)
Statistical features identified from comparison of homologous introns

H. Ogata, W. Fujibuchi, M. Kanehisa

1995Volume 6 Pages 96-97
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.96

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to develop an understanding of the evolutionary history of introns, we compared intron sequences from homologous genes of different vertebrate species. We observed the following statistical features of introns. From interspecies comparisons, the length difference of an intron from each gene exhibits considerable variations, which is in consistent with the previous observation on the wide distribution of insertion and deletion sizes. The distribution profiles also indicate that 68% of introns are longer in human than in rodentia, although such a significant shift in the intron length is not observed in mouse/rat and human/artiodactyla comparisons. From the analysis of diversities in base compositions and sequence data between intron and coding regions, a similar behavior was observed between intron and the third codon position, although intron sequences appeared to be more constrained. Those descriptive analysis will help us to understand the functional constraints and the recent history of introns.

View full abstract

Download PDF (226K)
Evaluation of exon prediction tools using a long DNA sequence data

Katsuhiko Murakami, Shiho Tsukuni, Toshihisa Takagi, Masahira Hattori

1995Volume 6 Pages 98-99
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.98

JOURNAL FREE ACCESS

Show abstractHide abstract

We have evaluated the ability to locate coding regions of two exon prediction software, GRAIL and FEX, using a long (about 301k bases) genomic DNA sequence. We performed an experiment to check the correctness of the exon candidates with high scores. FEX was more sensitive but less specific than GRAIL. The numbers of the exon predicted by both tools were much less than our simple estimation from the sequence length. To reduce more unreliable candidates, we proposed guidelines for users. If one uses the guidelines, both tools would be more practical even for DNA sequences longer than 100, 000 bases.

View full abstract

Download PDF (209K)
Comparison of Statistical Algorithms for Predicting Splice Junctions in mRNA Precursors of Mammalian Genes

Yukiyasu Ogawa, Tomomasa Nagashima, Sirajuddin Khawaja

1995Volume 6 Pages 100-101
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.100

JOURNAL FREE ACCESS

Show abstractHide abstract

It is well known that many eukaryotic genes are interruped by introns, which are removed from mRNA precursors by the RNA splicing mechanism. While it is well accepted that the consensus sequences of exon-intron boundaries in mRNA precursors are important for specifying splice sites, the signals that govern the excision of introns are not well understood yet, because actual splice site sequences are more or less different from the consensus sequences.
So far several statsistical methods for predicting actual splice site sequences (splice junctions) in pre-mRNAs of mammalian genes have been proposed. Shapiro-Senapathy gave a method by weight matrix; Iida has proposed a method based on the quantification analysis (categorical discriminant analysis). He applied his method for analyzing the 3'-splice site sequences as well as 5'-splice site and discussed mutational problems in beta-globin genes.
However, while the statistical methods proposed so far have some ability for predicting the splice site sequences in statistical tests, it seems to be far from sufficient whenn applied to actual problems i.e., predicting the actual splice junctions in complete mammalian genes.
We propose here a new algorithm which can be applicable to predict actual splice junctions in complete mammalian genes and demonstrate its ability by comparing the predicting performances between the algorithms by ours, Shapiro-Senapathy and Iida.
Our algorithm is described as an extention of the categorical discriminant analysis (CDA) by Iida into the hierarchical form i.e., at the first level, we start from the ordinary categorical discriminant anlysis and determin the classes of sample sequenses which do not fall into the overlapping region of sample scores. For the samples which fall into the overlapping region of the sample scores are treated in the next level. For these data, we apply the categorical discriminant analysis. This process is repeated till the number of the samples which fall into the overlapping region of sample scores becomes to be negligible.
By applying this algorithm to the 3'-splice junctions as well as 5'- splice junctions of Rat Chymotrypsin B gene, we have obtained fairy well predicting peformance compared to Shapiro-Senapaty and Iida. In tab.1, we have presented a comparision of the predicting ability obtained by ours and CDA. As shown in tab.1, our algorithm gives fewer potential sequences (candidate) for the splice junctions than CDA. As for Shapiro-Senapathy algorithm, we have also obtained that the algorithm gave more potential sequences than CDA, that implies our algorithm peforms well than CDA and Shapiro-Senapathy method.
As a summary, we have develloped the algorithm which acts as a filer for selecting the splice signals among a huge number of unknown sequences. We will discuss the details of our algorithm in the symposium.

View full abstract

Download PDF (197K)
Cyano Base: Visual presentation of information on the genome of Cyanobacterium Synechocystis sp. strain PCC6803 through WWW

Makoto Hirosawa, Takakazu Kaneko, Satoshi Tabata

1995Volume 6 Pages 102-103
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.102

JOURNAL FREE ACCESS

Show abstractHide abstract

Kazusa DNA Research Institute launched a project of sequencing the entire genome of an unicellular cyanobacterium Synechocystis sp. strain PCC6803 in 1994. The data that we gained through the project have been registered in public databases. But, it is not easy to properly grasp information on the cyanobacterium genome with only the information in the databases.
Now, we constructed CyanoBase, a data presentation system through WWW, that enables users to visualize the genomic information of this species. CyanoBase includes the additional information that are not described in public databases.

View full abstract

Download PDF (2711K)
A large-scale GenBank search of Expressed Sequence Tags using rapid identity searching program for DNA sequences

T. Nishikawa, K. Nagai

1995Volume 6 Pages 104-105
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.104

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a program for rapid identity-searching of DNA sequences allowing several percentages of sequencing error rates. The program was applied to a large-scale searching of Expressed Sequence Tags (ESTs) against the GenBank sequences, and from this searching results the error information of ESTs was obtained. The 15, 666 sequences of human ESTs were searched in the primate division in GenBank release 80 within 23.3 hours that is only one-thirty of the time needed when FASTA is used. The total error rate 2.45 percent was obtained from the alignments between the ESTs and the primate sequences satisfying the identity-conditions.

View full abstract

Download PDF (212K)
Prediction of Promoter Expression Specificity by Conserved Sequence Patterns

Wataru Fujibuchi, Minoru Kanehisa

1995Volume 6 Pages 106-107
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.106

JOURNAL FREE ACCESS

Show abstractHide abstract

We present here a prediction method for expression specificities of promoters by observing the appearance of conserved sequence patterns in a group of promoters, such as liver, brain, and house-keeping. Related promoters in the same group were compiled from EPD [1] database and an index to represent the group specificity of each pattern was calculated. Each promoter was examined for its specificity by the collection of these indices constructed from the rest of the promoters in our dataset. Currently, our system could discriminate 50% of human liver promoters with 11% false positive rate. The distribution profile of scores also suggested that the liver promoter group may be divided into two or more subgroups.

View full abstract

Download PDF (161K)
Mining Association Rules from Signals found in Mammalian Promoter Sequences

Gen Shibayama, Kenji Satou, Toshihisa Takagi

1995Volume 6 Pages 108-109
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.108

JOURNAL FREE ACCESS

Show abstractHide abstract

To find associations among large amount of genome data, we implemented a data mining algorithm developed by Houtsma et al. As the result of a computer experiment about signals found in mammalian promoter sequences, the system generated association rules with high accuracy and large coverage.

View full abstract

Download PDF (203K)
HAKKE: Automatic Predictor Generator for Sequences

Naohiro Furukawa, Takayoshi Shoudai, Ayumi Shinohara, Satoru Miyano

1995Volume 6 Pages 110-111
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.110

JOURNAL FREE ACCESS

Download PDF (2520K)
Learning Hidden Markov Models Using Back-Propagation through Time

Hiroshi Mamitsuka

1995Volume 6 Pages 112-113
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.112

JOURNAL FREE ACCESS

Download PDF (172K)
A Probabilistic Inference System for The Prediction of Subcellular Localization Sites of Proteins: Application to E. coli Data

Paul Horton, Kenta Nakai

1995Volume 6 Pages 114-115
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.114

JOURNAL FREE ACCESS

Show abstractHide abstract

We have proposed that the prediction of protein subcellular localization sites can provide a good clue for the characterization of open reading frames of unknown function [1, 2]. Our program, PSORT, has been used by a number of researchers through the Internet [3]. PSORT was originally written in the style of a ‘if-then’ rule-based system. Although this style has the merit of a great versatility in coding inference pathways, re-optimization of numeric parameters for either given training data or expanded rules needs an expert's manual work. Clearly, this character is not well suited for the rapidly-progressing state of genome analyses. Thus, we have been studying other mathematical models for inference that allow at least semi-automatic optimization of numeric parameters. Last year, we introduced a simple model, the water-flow model, that automatically finds required threshold parameters [4]. This model showed sufficiently high discrimination power for model data. Here, we describe an improved model and report its predictability when applied to more realistic data.
We first collected E. coli amino acid sequences of known subcellular localization sites from the PROSITE database (Rel. 31). Excluding hypothetical information, 336 sequences were collected in total. They were classified into the following 8 groups: lipoproteins at the inner membrane (imL), lipoproteins at the outer membrane (omL), inner membrane proteins with a cleavable signal sequence (imS), typical outer membrane proteins (om), periplasmic proteins (pp), inner membrane proteins with a signal- anchor signal (imU), inner membrane proteins with an internal signal (im), and cytoplasmic proteins including peripheral inner membrane proteins (cp). Since the precise information for topogenic signals is lacking in the database, this classification partly uses the prediction result of PSORT. The number of members varies from 2 to 143 for each group.
Currently, the reasoning tree is the same as the one used in the previous study [1]. However, we employed a probabilistic inference model. Basically, the model is a kind of “water-flow model”, as is the model proposed last year [4]. The most important difference is the use of a discrete set of conditional probability values for categorized ranges of the characteristic value at each node. These categories are defined such that each category roughly contains a uniform number of data points. However, since we still leave some room for eyeball inspection in this process, the calculation is semi-automatic.
Although there remains some room for improvement in the characteristic values (for example, the parameters used in MeGeoch's method for signal sequence recognition) at each node, we tested the predictability of the current model by the cross-validation method. The 336 data were randomly divided into a training set of 302 and a testing set of 34. This trial was carried out 10 times and the resultant values were averaged. The overall prediction accuracy marked 79.1% which does not differ much with the accuracy for discriminating the training data. We expect that the method can be applied to a much more complicated eukaryotic problem without terrible difficulty because of its conceptual clearness. Thus, this method seems very promising for upgrading the PSORT system in the near future.

View full abstract

Download PDF (201K)
Protein Sequence Grouping by Peptide Word Motifs

I. Uchiyama, A. Ogiwara, T. Takagi, M. Kanehisa

1995Volume 6 Pages 116-117
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.116

JOURNAL FREE ACCESS

Show abstractHide abstract

Methods for collecting related segments from the protein sequence database using strongly conserved peptide words as well as sequence homology was applied to the problem of reconstruction of PROSITE catalog ill from the sequence database. In many case our results were well consistent with PROSITE although some additional relationships were also found.

View full abstract

Download PDF (194K)
ANTISENSE HOMOLOGY BOXES IN PROTEINS

Lajos BARANYI, William CAMPBELL, Hidechika OKADA

1995Volume 6 Pages 118-119
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.118

JOURNAL FREE ACCESS

Show abstractHide abstract

Amphiphilic peptides approximately fifteen amino acids in length and their corresponding antisense peptides exist within protein molecules. These regions (termed antisense homology boxes) are separated by approximately fifty amino acids. Since many sense-antisense peptide pairs have been reported to recognize and bind to each other, antisense homology boxes may be involved in folding, chaperoning and oligomer formation of proteins. The finding that cc 70 per cent of the antisense homology box derived peptides from C5a receptor and human Endothelin A receptor are specific inhibitors or agonist to their corresponding proteins indicate that these regions can have significant role in proteins.

View full abstract

Download PDF (248K)
Multiple sequence alignment by combining incomplete blocks of similar segments

K. Suzuki, Y. Akiyama, M. Kanehisa

1995Volume 6 Pages 120-121
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.120

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a novel method for multiple sequence alignment based on combinatorial selection of similar block candidates. Our method resembles manual multiple alignment performed by biologists. The method is more feasible for finding functional motifs than previous multiple alignment algorithms that are extensions of pairwise alignments. We employed a Hopfield neural network technique so that the method can cope with the combinatorial explosion in examining a large number of “incomplete” block candidates.

View full abstract

Download PDF (201K)
An Approach to Amino Acid Sequence Alignment Using a Genetic Algorithm

Masato Wayama, Katsutoshi Takahashi, Toshio Shimizu

1995Volume 6 Pages 122-123
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.122

JOURNAL FREE ACCESS

Download PDF (225K)
Accuracy of multiple sequence alignments as assessed by reference to structural alignments

O. Gotoh

1995Volume 6 Pages 124-125
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.124

JOURNAL FREE ACCESS

Download PDF (186K)
Design of a Hardware Board for Sequence Alignment

T. Kato, A. Suyama, M. Taiji, T. Ebisuzaki

1995Volume 6 Pages 126-127
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.126

JOURNAL FREE ACCESS

Show abstractHide abstract

We have designed a special hardware board to calculate optimal alignments of two sequences based on the Myers-Miller dynamic programming algorithm. The board was designed to be able to calculate each similarity or distance matrix element in parallel in one system clock pulse. The present version of the board had four pipelines and thus can calculate 120 million matrix elements per one second.

View full abstract

Download PDF (233K)
Theoretical Prediction of Positioning, Orientation and Tilt Angles of Helices of a Small Membrane Protein Without Structural Template

Takatsugu Hirokawa, Makiko Suwa, Shigeki Mitaku

1995Volume 6 Pages 128-129
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.128

JOURNAL FREE ACCESS

Download PDF (209K)
A PARALLEL HYBRID GA FOR PEPTIDE 3-D STRUCTURE PREDICTION

Carlos A. DEL CARPIO, Shin-ichi SASAKI, Lajos BARANYI, Hidechika OKADA

1995Volume 6 Pages 130-131
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.130

JOURNAL FREE ACCESS

Show abstractHide abstract

The present work describes recent advances made in the system for 3-D structure prediction of polypeptides being developed in our laboratory. The system was originally conceived as a conformational space search procedure based on a simple genetic algorithm. However, the complexity of the system and the need to produce better fit conformers as artificial evolution proceeds, compelled us to improve the algorithm in two substantial aspects. The first is a parallelization of the original algorithm to enrich the diversity of conformers in the population and the second a hybridization of the original GA in order to process the atoms of the side chains.
The results are exemplified with the prediction of the 3D structure for CRAMBIN.

View full abstract

Download PDF (240K)
Three-Dimensional Motif Search of Proteins Using Abstract Representation of Secondary Structure Segment

Hiroaki KATO, Yoshimasa TAKAHASHI

1995Volume 6 Pages 132-133
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.132

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes an approach to three-dimensional (3-D) motif search of proteins, which is based on a graph-theoretical clique finding algorithm. In this implementation, higher abstract representation of a protein structure has been also investigated for the description of secondary structure information such as α-helix and β-strand. The algorithms and the implementations are discussed with a couple of execution examples of the 3-D motif search using protein structure database.

View full abstract

Download PDF (243K)
β-sheet Prediction using Inter-strand Residue Pair Propensities

Minoru ASOGAWA

1995Volume 6 Pages 134-135
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.134

JOURNAL FREE ACCESS

Show abstractHide abstract

Usually two residues of a pair in a β-sheet separated long distance in a protein sequence. Therefore, to achieve high prediction accuracy for a β-sheet, it is necessary to consider such inter-strand residue interactions. Since widely used methods for a β-sheet prediction are only based on subsequence of residues, generally 20 residues, they cannot achieve high prediction. In this paper, we describe a novel method to predict an anti-parallel and parallel β-sheet, utilizing residue pair propensities, which are calculated from the statistics of interstrand residue pairs. With primary experiment, it is shown that residue pair propensities are consistent with the result of real protein experiment and can detect parallel β-sheets with high accuracy.

View full abstract

Download PDF (185K)
A Study of Molecular Recognition between DNA and Metal Ions

N. Fukushima, M. Kanehisa

1995Volume 6 Pages 136-137
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.136

JOURNAL FREE ACCESS

Show abstractHide abstract

The dissociation energies have been calculated between A, C, G, or T of DNA and divalent metal ions (Mg²⁺ and Ca²⁺) and between a double-stranded DNA dimer d (AT) and a hydration shell including either Mg²⁺ or Ca²⁺ in order to investigate the ability of the sequence recognition of the ions. All calculations were carried out using the Hartree-Fock and the density functional methods on CRAY supercomputers. From the results of calculations it was shown that the O₂ atom of T (T (O₂)) and the N₇ atom of G (G (N₇)) played important roles for the recognition of AT and GC base pairs, respectively and that Mg²⁺ interacted more strongly than Ca²⁺ to the AT stack-site of the minor groove in B-DNA through the hydration shell.

View full abstract

Download PDF (193K)
LIGAND Chemical Database for Enzymatic Reactions

A Link between enzyme structures and chemical reactions

Takaaki Nishioka, Mikita Suyama, Susumu Goto, Yutaka Akiyama, Minoru K ...

1995Volume 6 Pages 138-139
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.138

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a database LIGAND Chemical Database for Enzymatic Reactions which is designed to link enzyme structures with enzyme-catalyzed chemical reactions. In the present paper, we report the present status of the database, WWW service on GenomeNet, and future plan.

View full abstract

Download PDF (212K)
Receptor Database Representation

K. Nakata, M. Hayakawa, T. Nakano, T. Kaminuma

1995Volume 6 Pages 140-141
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.140

JOURNAL FREE ACCESS

Show abstractHide abstract

A receptor database systems that gather data from information sources on the Internet has been developed. These sources include genetic database of PIR, Swiss Prot, PDB, GenBank, EMBL, GDB, etc... The system provides the detail information on receptor efficiently such as ligand binding site and DNA binding site, which were picked up from the references, and the three dimensional structure. The database search system operates on the unix workstations.

View full abstract

Download PDF (156K)
AAindex: A Database of Amino Acid Indices and Mutation Matrices

K. Tomii, M. Kanehisa

1995Volume 6 Pages 142-143
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.142

JOURNAL FREE ACCESS

Show abstractHide abstract

An amino acid index, which is a set of 20 numerical values, represents the various properties of amino acids. A similarity matrix, also called a mutation matrix, which is a set of 20×20 numerical values, represents the similarity between amino acids, and is used for protein sequence alignments and similarity searches. We have collected 402 amino acid indices and 42 published mutation matrices, and organized the database named AAindex which is made publicly available on the Internet.

View full abstract

Download PDF (176K)
A Software Tool for Mapping Human Genome by Chromosome-Specific Two-Dimensional Electrophoresis Method

Akira Ohyama, Tatsuya Akutsu, Asao Fujiyama

1995Volume 6 Pages 144-145
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.144

JOURNAL FREE ACCESS

Download PDF (141K)
Gene View^PLUS, Locus-in and Physical Mapper

The GUI-softwares for genome mapping data management

S. Minoshima, S. Mitsuyama, K. Kawasaki, N. Shimizu

1995Volume 6 Pages 146-147
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.146

JOURNAL FREE ACCESS

Download PDF (2122K)
gRanch: Gene Mapping Workbench

Y. Wada, H. Yasue, K. Inoue, H. Ohga

1995Volume 6 Pages 148-149
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.148

JOURNAL FREE ACCESS

Show abstractHide abstract

We have been developing a gene mapping workbench named gRanch. This system display the integrated gene map and two kinds of homology map between species. The functions of input pedigree or allele data and analysis the linkage map are implemented in the gRanch.

View full abstract

Download PDF (2566K)
An Integrated System for Large-Scale Sequencing: Dr. AGCT

Zhongqing Wang, Nobuyuki Miyajima, Akira Ohyama

1995Volume 6 Pages 150-151
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.150

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a set of tools for large scale sequence analysis, Dr. AGCT. MAEditor can incorporate the outputs from other assembly software and it can also be used to display single trace from both ABI and ALF sequencers. Sequence quality control is the most important feature of Dr. AGCT. Other project monitoring systems are also used to reduce the human intervention in several steps of sequencing process.

View full abstract

Download PDF (2984K)
DNA sequence comparison based on amino acid similarity

S. Hiraoka, K. Nagai

1995Volume 6 Pages 152-153
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.152

JOURNAL FREE ACCESS

Show abstractHide abstract

DNA databases are growing exponentially. Sequence similarities are often the most valuable information we can get from DNA databases. Especially for protein-coding sequences, comparison of translated sequences give us clues to protein function. However gaps in DNA sequences prevent us from translation and force us to compare them as they are. We present an algorithm for DNA sequence comparison which translates the sequences most reliably and compares the translated sequences. The method enables us find protein sequence similarity in DNA sequences even if we do not know the protein sequences which are coded in the DNA sequences.

View full abstract

Download PDF (173K)
Prediction of Transmembrane Helical Regions by Three-stage Model

Boon-Chien Seah, Shigeki Mitaku

1995Volume 6 Pages 154-155
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.154

JOURNAL FREE ACCESS

Download PDF (229K)
Experiments by BONSAI Garden

Naohiro Furukawa, Erika Tateishi, Takayoshi Shoudai, Satoru Miyano

1995Volume 6 Pages 156-157
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.156

JOURNAL FREE ACCESS

Download PDF (238K)
A WWW Tool for Organizing Knowledge of Biomolecular Reaction Pathways

Nobuo Tsukamoto, Minoru Kanehisa

1995Volume 6 Pages 158-159
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.158

JOURNAL FREE ACCESS

Show abstractHide abstract

We are developing a new database named BRITE (Biomolecular Reaction pathways for Information Transfer and Expression), which contains knowledge of interacting molecules and/or genes. Since construction of BRITE requires cooperation with specialists in the respective fields of molecular biology, we have developed a BRITE construction tool named “BriteExPress” that can be utilized easily by WWW. Here, we report an overview of this tool and actual database construction for cell cycle controls.

View full abstract

Download PDF (170K)
An integrated database SPAD (Signaling PAthway Database) for signal transduction and genetic information

Naoko Tateishi, Haruki Shiotari, Satoru Kuhara, Toshihisa Takagi, Mino ...

1995Volume 6 Pages 160-161
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.160

JOURNAL FREE ACCESS

Show abstractHide abstract

Signaling transduction is suggestive of classic symphonies. Organism, like all the great composers it created, depend on masterful variations of themes.
Many studies have rapidly increased our understanding of molecular mechanisms that mediate intercellular signaling transduction. To date, many components in signaling transduction have been identified and mechanisms of the control have been modeled. However, it is important to realize how, in the cell, components are regulated in a total system.
We have been developed an integrated database SPAD (Signaling PAthway Database) based on WWW (World Wide Web) to understand the overview of signaling transduction (http://www.grt.kyushu-u.ac.jp/eny-doc/spad.html). SPAD is classified into the four categories based on extracellular signal molecule (Growth factor, Cytokine, Hormone and Stress) that initiate the intracellular signaling pathway. SPAD compiled the protein-protein interaction, protein-DNA interaction and DNA sequence information. We adopted HTML (HyperText Markup Language) and HTTPD (HyperText Transfer Protocol Daemon) to make WWW server on Sun Workstation. As shown in Figure 1, the system provides a user friendly integrated interface for signaling transduction pathways. DNA sequence information of each gene was reconstructed from GenBank entries. Protein information was linked to SWISS-PROT in GenomeNet WWW server. Reference information of each element was linked to MEDLINE in NCBI.

View full abstract

Download PDF (2667K)
A WWW Database of Bacillus Subtilis ORFs determined by the International Project of Sequencing B. Subtilis Genome

A. Ogiwara, T. Takagi, N. Ogasawara

1995Volume 6 Pages 162-163
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.162

JOURNAL FREE ACCESS

Download PDF (2531K)
Developing Sequence Analysis Tools on Web Server

T. Yasunaga, T. Takagi, A. Takeuchi, T. Niiyama

1995Volume 6 Pages 164-165
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.164

JOURNAL FREE ACCESS

Download PDF (172K)
Fop (frequency of optimal codon usage): WWW service with its distribution analysis

Y. Nakamura, T. Ikemura

1995Volume 6 Pages 166-167
Published: 1995
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.6.166

JOURNAL FREE ACCESS

Show abstractHide abstract

We have been constructing a WWW server which shows: 1) Fop value for all of the full-length ORFs of E. coli and S. cerevisiae in GenBank DNA sequence database and 2) distribution of Fop value along the chromosomes. A program available on the server allows to calculate a Fop value for a sequence of these species.

View full abstract

Download PDF (170K)

Register with J-STAGE for free!