Genome Informatics

On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies

Louxin Zhang

1996 Volume 7 Pages 1-12
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.1

JOURNAL FREE ACCESS

Show abstractHide abstract

A conjecture of Mirkin, Muchnik and Smith is answered affirmatively which connects the inconsistency function, a biologically meaningful dissimilarity measure for a gene and species tree, to the mutation cost function, a combinatorial measure based on mapping of trees. A linear-time algorithm for computing the inconsistency function is also derived from the conjecture.

View full abstract

Download PDF (1060K)
Approximation Algorithms for Genome Rearrangements

sorting signed permutations by reversals and transpositions

Qian-Ping Gu, Shietung Peng, Hal Sudborough

1996 Volume 7 Pages 13-22
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.13

JOURNAL FREE ACCESS

Show abstractHide abstract

Recently, a new approach to analyze genomes evolving was proposed which is based on comparison of gene orders versus traditional comparison of DNA sequences (Sankoff et al, 1992). The approach is based on the global rearrangements (e.g., inversions and transpositions of fragments). Analysis of genomes evolving by inversions and transpositions leads to a combinatorial problem of sorting by reversals and transpositions, i. e., sorting of a permutation using reversals and transpositions of arbitrary fragments. The problem is conjectured as NP-hard. We study sorting of signed permutations by reversals and transpositions, a problem which adequately models genome rearrangements, as the genes in DNA are oriented. We establish a lower bound and give two algorithms for the problem. Based on the lower bound, we show that the first algorithm is a 2-approximation algorithm. The time complexity of the algorithm may not be bounded by Poly (n), where n the length of the permutation to be sorted. Setting a time limit to the first algorithm, we get the second algorithm which is a 2 (1+1/k)-approximation one, where k ≥ 3 is any fixed integer, and runs in Poly (n) time.

View full abstract

Download PDF (985K)
A Clustering Method for Molecular Sequences based on Pairwise Similarity

H. Matsuda, T. Ishihara, A. Hashimoto

1996 Volume 7 Pages 23-32
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.23

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a method for clustering a large and mixed set of uncharacterized sequences provided by genome projects. As the measure of the clustering, we use a fast approximation of sequence similarity (FASTA score). However, in the case to detect similarity between two sequences that are much diverged in evolutionary process, FASTA sometimes underestimates the similarity compared to the rigorous Smith-Waterman algorithm. Also the distance derived from the similarity score may not be metric since the triangle inequality may not hold when the sequences have multi-domain structure. To cope with these problems, we introduce a new graph structure called p-quasi complete graph for describing a cluster of sequences with a confidence measure. We prove that a restricted version of the p-quasi complete graph problem (given a positive integer k, whether a graph contains a 0.5-quasi complete subgraph of which size≥k or not) is NP-complete. Thus we present the outline of an approximation algorithm for clustering a set of sequences into subsets corresponding to p-quasi complete graphs. The effectiveness of our method is demonstrated by the result of clustering Escherichia coli protein sequences by our method.

View full abstract

Download PDF (1045K)
Approximate Multiple String Searching by Clustering

Fei Shi, Peter Widmayer

1996 Volume 7 Pages 33-40
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.33

JOURNAL FREE ACCESS

Show abstractHide abstract

We are given a finite set S of text strings and a pattern P over some fixed alphabet Σ. The topic of this paper is the design of a data structure D (S) which supports approximate multiple string searching queries efficiently. Thereby, for a given upper bound k ∈ Z⁺ on the allowable distance, P=p₁...p_m is said to appear approximately in a text T=t₁...t_n, m, n ∈ Z⁺, if there exist positions u, v in T such that the edit distance between P and t_u...t_v is at most k. Let N denote the sum of the lengths of all strings in S. We present an algorithm that constructs the data structure D (S) in O (N) time and space. Afterwards, an approximate multiple string search query can be answered in 0 (N) expected-time if the allowable distance k is bounded above by 0 (m/log m). The method can be used to search large nucleotide and amino acid sequence databases for similar sequences.

View full abstract

Download PDF (852K)
Parametric Alignment of Multiple Biological Sequences

Tetsuo Shibuya, Hiroshi Imai

1996 Volume 7 Pages 41-50
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.41

JOURNAL FREE ACCESS

Show abstractHide abstract

The alignment problem of DNA or protein sequences is very applicable and important in various fields of molecular biology. In this problem, the obtained optimal solution with fixed parameters (gap penalties, weights for weighted multiple alignment problems, and so on) is not always the biologically best alignment. Thus, it is required to vary parameters and check the varying optimal alignments. The way to vary parameters has been studied well on the problem of only two sequences [6, 7, 12, 13, 14, 15], but not in the multiple alignment problem because of the difficulty of computing the optimal solution. This paper presents techniques for parametric multiple alignment problem, and examines the features of obtained alignments by parametric analysis on gap penalty and weight matrix through experiments. These experiments reveal the importance of adopting appropriate parameter values to obtain meaningful multiple alignments.

View full abstract

Download PDF (963K)
Finding Minimal Multiple Generalization over Regular Patterns with Alphabet Indexing

Michiyo Yamaguchi, Shinichi Shimozono, Takeshi Shinohara

1996 Volume 7 Pages 51-60
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.51

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a learning algorithm that discovers a motif represented by patterns and an alphabet indexing from biosequences. From only positive examples with the help of an alphabet indexing, the algorithm finds k regular patterns as a k-minimal multiple generalization (k-mmg for short). The computational results for transmembrane domains indicate that the combination of k-mmg and alphabet indexing works quite successful. We also introduce a partial alphabet indexing that transforms symbols dependently on the position in sequences.

View full abstract

Download PDF (930K)
Systematization of Species-Specific Diversity of Genes in Codon Usage

Comparison of the Diversity Among Bacteria and Prediction of the Protein Production Levels in Cells

Shigehiko Kanaya, Yoshihiro Kudo, Shinya Suzuki, Toshimichi Ikemura

1996 Volume 7 Pages 61-71
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.61

JOURNAL FREE ACCESS

Show abstractHide abstract

In the present study, we have developed the procedure for estimating species-specific heterogeneous codon usage among intraspecific genes called diversity in codon usage and for systematizing species by the species-specific diversity on the basis of principal component analysis. We tried to quantify differences of the diversity among five species, Escherichia coli (Ec), Salmonella typhimurium (St), Haemophilus influenzae (Hi), Bacillus subtilis (Bs), and Synechocystis sp.(Ss). In the five species, many of genes involved in the translation process and energy metabolism had positive values (Z₁>0) on the first principal component (PC1). In Ss, many of genes involved in photosynthetic system had also postive Z₁-values. These genes are thought to be highly expressed. By the direction of PC1, the five species were roughly classified into three categories, [Ec, St, Hi], [Ss], [Bs]. The dendrogram constructed was roughly consistent with the rRNA-based phylogeny, but interesting differences were also observed between the two phylogenic trees.

View full abstract

Download PDF (946K)
Refinement of The Prediction Methods of Signal Peptides for The Genome Analyses of Saccharomyces cerevisiae and Bacillus subtilis

Kenta Nakai

1996 Volume 7 Pages 72-81
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.72

JOURNAL FREE ACCESS

Show abstractHide abstract

Since signal peptides play a crucial role for specifying the in-vivo fate of proteins, prediction of their existence is important for the characterization of ORFs of unknown function. To make such predictions as reliable as possible, the features of signal peptides of two important model organisms, Saccharomyces cerevisiae and Bacillus subtilis, were examined and the accuracy of current prediction methods was refined using these data. Direct optimization of the threshold values of existing methods significantly raised the predictability but the variables that were most effective for improvement were different in these two organisms. In yeast, the maximum hydrophobicity value of an 8-residue segment mainly contributed to raising the predictability to 98.5% when estimated by the cross validation procedure. In Bacillus species, the length of uncharged segment and the charges in the N-terminal region (net charge and negative charge) were combined to give a prediction accuracy of 98.2% although the data size was relatively small in this case.

View full abstract

Download PDF (1000K)
An Algorithm for Highly Specific Recognition of Protein-coding Regions

M. S. Gelfand, T. V. Astakhova, M. A. Roytberg

1996 Volume 7 Pages 82-87
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.82

JOURNAL FREE ACCESS

Show abstractHide abstract

Since absolutely reliable recognition of protein-coding regions in eukaryote genomic DNA sequences by computational methods is unattainable, most existing algorithms try to keep some balance between underprediction and overprediction. However, in experimental practice it is often sufficient to have just a few protein-coding segments, but predicted with high specificity, that is, with (almost) no overprediction. Such predictions are then used for construction of oligonucleotide probes and PCR primers for analysis of cDNA libraries or total cellular RNA.
Here we present a combinatorial algorithm solving this problem. Unlike other prediction schemes, the algorithm uses only the simplest statistical parameters (codon usage and positional nucleotide sequences in splicing sites) and thus can be used for analysis of obscure genomes, when large learning sets are unavailable. The algorithm's structure allows one to simply tune it for various experimental settings.

View full abstract

Download PDF (550K)
Finding Genes by Hidden Markov Models with a Protein Motif Dictionary

Kiyoshi Asai, Tetsushi Yada, Katunobu Itou

1996 Volume 7 Pages 88-97
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.88

JOURNAL FREE ACCESS

Show abstractHide abstract

A new method for combining protein motif dictionary to gene finding system is proposed. The system consists of Hidden Markov Models (HMMs) and a dictionary. The HMMs represents the nucleotide acid bases, the codons, and the amino acids. The ‘words’ in the dictionary is described by the sequence of these HMMs and represent the noncoding regions, the codons, protein motifs, tRNA regions and signals in DNA sequences. The statistics between these regions are expressed by the “grammar”, which is a stochastic network of the ‘words’.
Using the same kind of technique of speech recognition by HMMs with a word dictionary and a grammar, the stochastic network of ‘words’ enables the motif dictionary to be used during the parsing of the DNA sequences. At the same time, the information of the di-codon statistics, which are known as the important parameters, is included in the stochastic network. As a result, while the system parses DNA sequences and finds the coding regions, the protein motifs are automatically annotated in the regions. It helps to identify the functions of the genes and reduces the cost of homology search for each hypothetical coding regions. This method is different from simply using the the information of homology search. This method uses the information of the motif patterns during the parsing process, but searching the motif patterns after/before finding the coding regions cannot directly affect the parsing process itself. Experimental results have shown that this method correctly finds and annotates the motifs in the coding regions in the DNA sequence of cyanobacterium.

View full abstract

Download PDF (3041K)
HAKKE: A Multi-Strategy Prediction System for Sequences

Naohiro Furukawa, Satoshi Matsumoto, Ayumi Shinohara, Takayoshi Shouda ...

1996 Volume 7 Pages 98-107
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.98

JOURNAL FREE ACCESS

Show abstractHide abstract

We developed a machine learning system HAKKE which is suitable for predicting functional regions from sequences, such as protein-coding region prediction, and transmembrane domain prediction. HAKKE is a hybrid system cooperated by a number of algorithms of a pool to make an accurate prediction. The system uses an extension of the weighted majority algorithm in order to fit the strength of each algorithm into given training examples. In this paper, we describe the core of the system and show some experimental results on transmembrane domain and a-helix predictions.

View full abstract

Download PDF (2994K)
Modelling Proteins Conformation in Solution. Part I: A Parallel GA Engine for Protein Conformational Space Mapping

Carlos A. Del Carpio, Valentin Gogonea

1996 Volume 7 Pages 108-118
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.108

JOURNAL FREE ACCESS

Show abstractHide abstract

This article is the first of a series of papers describing the development of an automatic system for prediction of the three dimensional conformation of proteins in solution. In this first part we discuss the implementation of the protein conformational space mapping engine. This is a procedure based on a robust parallel genetic algorithm which runs on a network of transputers. We describe aspects of the algorithm related to the major factors that influence the protein folding process and describe their implementation within the scheme of the evolutionary algorithm. Among them, we make a throughout review of the co-operativity of emergent partial secondary structures as the evolutionary process proceeds and its effects on the stability of new generated conformers as well as a better performance of the GA. We then undertake the hydrogen bond and synthesize the demographic trends in known proteins suggested by Stickle et. al., and also implement them as an index of goodness assessment of the generations of protein conformers. Finally, we make an intensive analysis of the packing of the amino acid side chains and show how a hybrid algorithm can utter a relaxation of the perturbations brought about by the operations of the GA, and the genuine improvement of the overall process. In the second paper of this series we propose guidelines under which we implement the solvent effect which in concourse with the above mentioned factors results in a system for protein 3D structure prediction in solution.

View full abstract

Download PDF (1139K)
Effect of Secondary Structure Prediction on Protein Fold Recognition and Database Search

Nickolai N. Alexandrov, Victor V. Solovyev

1996 Volume 7 Pages 119-127
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.119

JOURNAL FREE ACCESS

Show abstractHide abstract

Hydrophobic long-range interactions and local polypeptide chain propensities are the major factors directing protein folding. Incorporating both these terms in addition to the Dayhoff matrix helps us to increase quality of protein fold recognition via sequencestructure alignment. We have shown that the results of secondary structure prediction substantially increase a sensitivity of the fold recognition. To measure a performance of the protein fold recognition, we have developed a comprehensive test along with a set of the quality control scores based on the most populated structural families. With this test we have demonstrated improvement of the sequence alignment with consideration of the predicted secondary structure, even without knowledge of the real three-dimensional structure.

View full abstract

Download PDF (2615K)
Analysis of Binary Relations and Hierarchies of Enzymes in the Metabolic Pathways

Hiroyuki Ogata, Wataru Fujibuchi, Hidemasa Bono, Susumu Goto, Minoru K ...

1996 Volume 7 Pages 128-136
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.128

JOURNAL FREE ACCESS

Show abstractHide abstract

In conjunction with a new database system that efficiently organizes the metabolic pathway data from various organisms, we are developing computational methodologies using binary relations and hierarchies of enzymes. Biological knowledge integrated in the system includes genes, gene products, chemical compounds, enzyme reactions and metabolic pathway diagrams. By automatically mapping the enzymes of a specific organism on the pathway diagrams, it becomes possible to visualize the characteristic features of the organismspecific metabolic pathways. With the aid of the computational methodology implemented in the system, it becomes again possible to analyze and investigate the pathways in terms of their function and evolution. In this paper, we describe the outline of the system and present new biological features of metabolic pathways revealed by the system.

View full abstract

Download PDF (811K)
MetaViewer and MetaCommander: Applying WWW Tools to Genome Informatics

Yasuhiko Kitamura, Tetsuya Nozaki, Hideyuki Nakanishi, Teruhisa Miura, ...

1996 Volume 7 Pages 137-146
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.137

JOURNAL FREE ACCESS

Show abstractHide abstract

With the advance of the Human Genome Project, a huge amount of various genome data has been stored in a number of databases and the WWW system is widely used to access these databases. From the viewpoint of information supplier, the WWW is a quite useful tool to provide various types of data easily, but from the viewpoint of information consumer, it is not good enough because of lack of rigid data format and difficulty of data access. In this paper, by extending a current WWW browser, we propose two generic WWW tools; MetaViewer and MetaCommander, and try to apply them to the genome informatics to support researchers who search, analyze, and dispatch genome data, and discuss their potential advantages from the viewpoint of information consumer.

View full abstract

Download PDF (8510K)
FOREST, a Browser for Huge DNA Sequences

R. Gras, J. Nicolas

1996 Volume 7 Pages 147-156
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.147

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a new tool, FOREST, aiming at representing the content of a large nucleic acid sequence (e.g.> 100KB) in a suitable form for the biologist. More precisely, FOREST builds all subsequences repeated in a sequence or a set of sequences. It allows not only to look for the location of the various occurrences of a given subsequence but points also to interesting subsequences with respect to a given criterion. This tool is based on two key ideas. The first idea consists to build a suffix-tree representation of a sequence and to associate to each node of this tree a set of synthesized attributes, computed on the set of subsequences under this node. This allows the biologist to “browse” in the sequence with a constant abstract view of what he may expect to find in the section of the tree he is currently investigating. The second idea consists to summarize the distribution of the information with boolean vectors associated to the sequence. These vectors may be easily displayed in form of a linear map of events, as it is done in genetic mapping. Both representations allow various efficient operations on the sequence. They provide a powerful filtering capacity of the data, while reducing the set of elementary filtering operations to a minimum of conceptual operations. This allows the biologist to easily investigate the most prominent features of the lexical structure of its sequences.

View full abstract

Download PDF (6773K)
Development of New DDBJ DNA Sequence Database with Data Annotation Tool Yamato II

T. Koike, T. Okayama, J. Ishii, T. Mizunuma, T. Tamura, Y. Tateno, H. ...

1996 Volume 7 Pages 157-165
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.157

JOURNAL FREE ACCESS

Show abstractHide abstract

As the molecular biology has made a rapid progress these years, there has been a great number of changes required of the methodology for maintaining and utilizing DNA sequence data. For example, annotation to sequences has become complex and extensive. DDBJ which recognized the impending requirements decided to develop a new DNA sequence database system in 1995. To tolerate with frequent changes of the data structures and significant increment of the data in terms of quality and quantity, we designed a completely new database schema. In the new system, physical changes of the data structure do not affect such applications as a tool for annotation. We also designed a new annotation tool with object oriented concept that allows us to handle DNA sequence data in computers as intuitively as in the real world. The annotation tool is named as YAMATO II. We also take care of needs from DDBJ itself in the new system. Data traffics and security in the database access are especially analyzed and reviewers of data for DDBJ who are distant from DDBJ are now able to process the data safely and comfortably in the new system. The new system also realized more robust and effective data exchange with partners in the international nucleotide sequence banks, EMBL and GenBank.

View full abstract

Download PDF (4909K)
A Taxonomy Database System for Managing Multimedia Contents

Hajime Kitakami, Yasuma Mori, Masatoshi Arikawa

1996 Volume 7 Pages 166-167
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.166

JOURNAL FREE ACCESS

Show abstractHide abstract

We developed a taxonomy database system for managing multimedia contents. The system is accessible from remote users through World-Wide Web and is implemented in SQL programming and CGI (Common Gateway Interface) scripts of World-Wide Web.

View full abstract

Download PDF (2574K)
Automated Identification of Three-Dimensional Motif in Proteins

Hiroaki Kato, Yoshimasa Takahashi

1996 Volume 7 Pages 168-169
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.168

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes an approach to automated identification of three-dimensional (3-D) motif in proteins. Here, the structure of a protein was reduced into abstract representation which consists of the α-helix and β-strand secondary structure elements, these being described by vectors in 3-D space rather than the point-like atoms that are used in the simple Ca approximation. The algorithms and the implementations are discussed with a couple of execution examples of the identification of the 3-D motif candidates using well known motifs.

View full abstract

Download PDF (212K)
Characterization of Enzyme Structure by Informational Complexity

Takayuki Kamei, Yasuo Yonezawa

1996 Volume 7 Pages 170-171
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.170

JOURNAL FREE ACCESS

Show abstractHide abstract

It is well-known that enzymes is very important as reaction factor in life systems activity. But the properties based on information theory are not yet enough in biological studies. Then, we examined correlation the complexity at amino acid sequences with its function of Enzymes by informational measure, in order to elucidate the informational properties of sequence structure. Also, power spectrum of enzyme complexity are obtained specific profile by Fourier Transform (FT) method. At results, correlation at sequence complexity, the sequence of enzyme Proteins are given complexity more than non-enzyme Proteins. Moreover, FT profile are given typical pattern at complexity of enzyme Protein sequences. This result are suggested that the new view-point for Protein analysis by information Science.

View full abstract

Download PDF (212K)
Java Version of Animal Genome Database

Y. Wada, H. Yasue

1996 Volume 7 Pages 172-173
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.172

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a new version of the Animal Genome Database for network users using Java applets. A new version has included linkage homology map, Java version of clickable linkage map, Japanese tutorial with audio clip. Furthermore, we have started the Mouse Genome Informatics mirror site in Japan.

View full abstract

Download PDF (2338K)
A Symbolic Representation for RNA Secondary Structures; towards the Construction of RNA Secondary Structures Data Base

A. Nakaya, A. Yonezawa, K. Yamamoto

1996 Volume 7 Pages 174-175
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.174

JOURNAL FREE ACCESS

Download PDF (136K)
Multiple Sequence Alignment Using a Genetic Algorithm

Masamichi Isokawa, Masato Wayama, Toshio Shimizu

1996 Volume 7 Pages 176-177
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.176

JOURNAL FREE ACCESS

Download PDF (206K)
Discovering Functional Sites of Amino Acid Sequences Using Sorted Variable Generalization

Takashi Ishikawa, Shigeki Mitaku, Takao Terano, Makiko Suwa, Takatsugu ...

1996 Volume 7 Pages 178-179
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.178

JOURNAL FREE ACCESS

Show abstractHide abstract

This research develops a method for discovering functional sites of amino acid sequences using an Inductive Logic Programming (ILP) method with sorted variable generalization. Functional sites provide clues to building a knowledge base for prediction of protein functions from amino acid sequences. The proposed method generates hypotheses of functional sites directly from aligned amino acid sequences using an ILP method extended with sorted variable generalization. The proposed method is shown to be useful for discovering functional sites by an example application to the case of bacteriorhodopsin-like proteins.

View full abstract

Download PDF (268K)
Improvement of the Transmembrane Helix Prediction System by Three-Stage Model

Takatsugu Hirokawa, Boon-Chieng Seah, Shigeki Mitaku

1996 Volume 7 Pages 180-181
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.180

JOURNAL FREE ACCESS

Show abstractHide abstract

A new method to predict the transmembrane helices from amino acid sequences was developed, in which the effect of the stabilization of helices by interhelix binding was taken into account. It was assumed that there are three stages of transmembrane helix conformation: the binding to membrane surface, the formation of transmembrane core region, and the maturation of helix due to the tertiary structure formation in membrane. This method was applied to the amino acid sequences of membrane proteins whose number of transmembrane helix are given, and most transmembrane helices were truly predicted.

View full abstract

Download PDF (211K)
Sequence Analysis of Short Tandem Repeats in the Genomes of H. influenzae and M. genitalium

Takanori Washio, Masahiko Wada, Masaru Tomita

1996 Volume 7 Pages 182-183
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.182

JOURNAL FREE ACCESS

Download PDF (163K)
On Correlation between G+C Contents and Intron Lengths: Longer Introns Tend to be More A+T Rich

Yasuhiro Asakawa, Masaru Tomita

1996 Volume 7 Pages 184-185
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.184

JOURNAL FREE ACCESS

Show abstractHide abstract

An interesting correlation between G+C contents and the lengths of primate introns have beenfound by our computer analysis.
All sequences of primate introns were extracted from the Genbank database and classifiedinto subgroups according to their lengths (the number of bases; increment of 100). G+C contents (%) were then calculated for each subgroup.
The results indicate that shorter introns tend to contain more G and C nucleotides, andlonger introns contain A and T nucleotides.
Frequencies of each nucleotide for each subgroup are shown in figure 1.
We also computed G+C contents of exons flanking those introns for each subgroup. As wecan see in figure 2, the similar but weaker tendencies are observed.
Biological significance of those observations is currently under investigation. We also intendto extend our analysis to other eukaryotes.

View full abstract

Download PDF (2355K)
CpG Dinucleotide Distribution and DNA Methylation

Tom Shimizu, Kouichi Takahashi, Masaru Tomita

1996 Volume 7 Pages 186-187
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.186

JOURNAL FREE ACCESS

Download PDF (194K)
Computer Analyses of Nucleotide Patterns around Start Codons

Rintaro Saito, Hidekazu Sasaki, Yuko Osada, Masaru Tomita

1996 Volume 7 Pages 188-189
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.188

JOURNAL FREE ACCESS

Download PDF (2278K)
Computer Analyses of Overlapping Genes in Mycoplasma Genitalium

Yuko Osada, Ryo Matsushima, Masaru Tomita

1996 Volume 7 Pages 190-191
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.190

JOURNAL FREE ACCESS

Download PDF (188K)
Computer Analyses of Site-Specific Variabilities in Human Alu Sequences

Michiko Muraki, Masaru Tomita

1996 Volume 7 Pages 192-193
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.192

JOURNAL FREE ACCESS

Download PDF (2593K)
Inferring History of Genomic Duplication Using Subclassified Alu Elements

Yoshimi Toda, Masaru Tomita

1996 Volume 7 Pages 194-195
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.194

JOURNAL FREE ACCESS

Show abstractHide abstract

As been described above, Alu subfamily classification, direct repeats, and poly-a tails can be used as markers to refine sequence analysis and infer history of duplication events with high degree of confidence.

View full abstract

Download PDF (2823K)
Motif Extraction: Normalization of Scores

Y. Fujiwara, M. Asogawa

1996 Volume 7 Pages 196-197
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.196

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper examines a method to normalize a score of a stochastic motif, represented by a hidden Markov model (HMM). The accuracy of the Z score method, which is one ofthe score normalization method, is compared with that of the whole search method.

View full abstract

Download PDF (195K)
Beta-sheet Prediction Using Inter-strand Residue Pairs and Refinement with Hopfield Neural Network

Minoru Asogawa

1996 Volume 7 Pages 198-199
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.198

JOURNAL FREE ACCESS

Download PDF (178K)
Building A Receptor Database

K. Nakata, T. Igarashi, M. Hayakawa, T. Kaminuma

1996 Volume 7 Pages 200-201
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.200

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a database of receptors, which gather data from information sources on the Internet. The source of this database is a variety of genomic and biological information on the internet; PIR, Swiss Prot, PDB, GenBank, EMBL, GDB, etc…The system provides the detail structure and functional information on receptors, such as ligand binding site and DNA binding site, which were picked up from the references, and the three dimensional structures. The system was implemented on the unix workstation (IRIS, INDIGO 2), using an object oriented database management system ACEDB (A Caenorhabditis elegans Data Base).
ACEDB is an object oriented database management system, which has been developed as part of the Caenorhabditis elegans genome research. This database is a generalized genome database, and can be used to create new database without the need for any reprogramming or in fact any sophisticated computer skills.
The system provides various viewing tools that effectively display different types of receptor data; DNA sequences, amino acids sequences, DNA binding sites, ligand binding sites, gene and disease information, and the protein structural information. It can also display three dimensional structure of molecules using a freeware molecular graphics RASMOL. The detail information for ligand and signal transduction, which are picked up from references, are also included. The system has also a browser interface so that database can be accessed via World Wide Web. The information regarding the sites of action on the receptor are highly interesting in biologically, medically and pharmacologically. The database may be useful for quick reference for ligand-membrane receptors and signal transduction in the drug design. We may use the database for the functional and structural analyses of receptors.

View full abstract

Download PDF (2166K)
Direct Comarison DNA and Amino Acid Sequences Based on a Dynamic Programming Method

Naoko Kasahara, Keiichi Nagai, Susumu Hiraoka

1996 Volume 7 Pages 202-203
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.202

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a method based on a dynamic programming method, that enables us directly compare DNA and amino acid sequences. This method makes it possible to find homologies between translated DNA sequences and amino acid sequences by recognizing gaps in both types of sequences. This method allows higher sensitivity and specificity than possible with BLASTX, which has a similar function. To reduce the computation time, we performed a parallel computation on a workstation cluster using a PVM (Parallel Virtual Machine) programming.

View full abstract

Download PDF (204K)
SAKURA: A New Data Submission System of DDBJ to Meet Users' Needs in the Age of Mass Production of DNA Sequences

Hikaru Yamamoto, Takuro Tamura, Katsumi Isono, Takashi Gojobori, Hidea ...

1996 Volume 7 Pages 204-205
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.204

JOURNAL FREE ACCESS

Download PDF (196K)
Emergent Rhythms in an Artificial Chemical World Using ‘Genetic Switches’

Hiroaki Inayoshi, Hitoshi Iba

1996 Volume 7 Pages 206-207
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.206

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a model of the artificial chemical world and its computer simulation, in which rhythms emerge. The model specifies four items of the artificial chemical world:(1) components (five kinds of particles and DNA having Genetic Switches);(2) space (2-dimensional polar grids);(3) simple reaction rules (construction and destruction of molecules, etc.);(4) simple behavioral rules (stochastic movements and stochastic collisions, etc.); The simulation demonstrates the capability of the system to exhibit emergent behavior: that is, global order of the system (regular rhythms in this case) emerges out of randomness (thorough stochastic movements and collisions) of its components.

View full abstract

Download PDF (2395K)
A New Software For Visualization of Large Proteins

Yutaka Ueno, Kiyoshi Asai

1996 Volume 7 Pages 208-209
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.208

JOURNAL FREE ACCESS

Show abstractHide abstract

Studies of interaction between protein molecules sometimes require visualizing huge numbers of atoms in a molecular graphics pictures. Namba et al.[1] has reported that simplification and enhancement makes molecular pictures informative in their structural study of protein and nucleic acid in the tabaco mosaic virus. Their pictures have boundary outlines to distinguish different monomers which are symmetrically packed to form the virus, but it is not accomplished by currently available molecular graphics softwares. As novel research in structure biology increased, we will need more functions for graphics software to meet our biological interest. However most software are hard to modify and not expected to be improved on a specific request.
A new software development project of an extensible protein visualization program for structure analysis and prediction study has started for this demand. Our goal is to provide a software platform which runs on common hardware and allow users to add new functions with average programming skill. Our first version is a structure viewer program of proteins in PDB database.
In this project, an application supporting library was designed together with a target program to lead clear prospect of the complicated programming. Among number of technical issues for building a graphics software, 3d-graphics library and memory management functions are redesigned for fast drawing of large number of atoms. An original plug-in module function and a graphical user interface tool kit is also designed. This plug-in module was implemented by dynamic linking system calls in Unix system. The program can be configured with necessary modules from numbers of viewing and analysis functions for the software which we will develop eventually. Also a special calculation function using atomic coordinate data can be added by writing a new plug-in module. In contrast, macro language has been used in some systems, it never be faster and powerful than a binary code of plug-in module. A robust module interface design is now revised.
Prototyping has completed on Unix with X-Window system. This first version has basic protein visualization features, such as several molecular model representation, rotation and two new features: 1) boundary outline to distinguish different molecules; 2) amino acid sequence windows are linked to 3-dimensional viewing window of the protein, where a selection echo is shown also in another window. It gives us a nice tracking view of peptide chains on navigating large proteins. Several examples of protein pictures made by this prototype will be presented in poster: a molecular interaction study of muscle proteins. Actin (45kD) and Myosin (head sub fragment S-1: 120kD) which are known to interact to generate force. Actin forms a filament in muscle, so several Actin monomer should be drawn, and one or two Myosin would interact in a picture. This case will be more than 4000 of alpha-carbons.
Our program was written in C with Xlib and ordinary libraries and going to be released forUnix systems. Versions for personal computers are also planed to take advantage of their high potential in hardware.

View full abstract

Download PDF (153K)
Automatic Discovery of Hidden Markov Representations for Functional Sites within DNA Sequences

Tetsushi Yada, Yasushi Totoki, Masato Ishikawa, Kiyoshi Asai

1996 Volume 7 Pages 210-211
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.210

JOURNAL FREE ACCESS

Download PDF (2286K)
GeneHacker: Gene-finding Program for the Prediction of Precise Protein Coding Regions

Makoto Hirosawa, Tetsushi Yada

1996 Volume 7 Pages 212-213
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.212

JOURNAL FREE ACCESS

Download PDF (208K)
CyanoBase: The Genome Database for Synechocystis sp. strain PCC6803

Yasukazu Nakamura, Nobuyuki Miyajima, Makoto Hirosawa, Takakazu Kaneko ...

1996 Volume 7 Pages 214-215
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.214

JOURNAL FREE ACCESS

Download PDF (2572K)
Management System for Sequencing Data of Human Genome

As a Part of ALIS

Mika HIRAKAWA, Kensaku IMAI, Akira OHYAMA, Fumihiko KIKUCHI

1996 Volume 7 Pages 216-217
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.216

JOURNAL FREE ACCESS

Show abstractHide abstract

ALIS (Advanced Life science Information Systems) is dedicated to supporting and encouraging large scale human genome research by creating and distributing databases and providing the computing environment. We report on the primary status of ALIS project and our WWW service site (http://www-alis.tokyo.jst-c.go.jp). The primary stage of the project has three aspects: large-scale human genome sequencing, construction an integrated human genome database and development of supporting function for the database.

View full abstract

Download PDF (187K)
Establishment and Management of Transcription Factor Database TFDB

T. Okazaki, M. Kaizawa, H. Mizushima

1996 Volume 7 Pages 218-219
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.218

JOURNAL FREE ACCESS

Show abstractHide abstract

D. Gohsh of National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institute of Health originally maintained ‘TFD (Transcription Fac-tor Database)’ from 1990. As NCBI stopped its maintenance since 1993, we started a new database, TFDB (Transcription Factor Data Base), to take over some parts of the database focusing to the DNA binding sequence data. To update the database with recent data, we developed system which search literature database exhaustively and extract re-lated information from the abstracts of collectedarticles. We also developed mail server to search target sequence of transcription factor using this database.

View full abstract

Download PDF (161K)
Very Fast Identification of tRNA in Genomic DNA

F. Lisacek, N. El Mabrouk

1996 Volume 7 Pages 220-221
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.220

JOURNAL FREE ACCESS

Download PDF (191K)
Predicting and Learning RNA Secondary Structures

Aki HASEGAWA, Yasuo UEMURA, Satoshi KOBAYASHI, Takashi YOKOMORI

1996 Volume 7 Pages 222-223
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.222

JOURNAL FREE ACCESS

Show abstractHide abstract

It is of great significance to develop an efficient software system for higher-level structural prediction in RNA/protein sequences. Speaking of RNA secondary structure prediction, it is inevitably required that a prediction system must have an ability to deal with so-called “pseudoknot” structures, one of the most typical and important constructs found in vivo, while no effective system is yet reported for predicting RNA secondary structures involving in pseudoknots.
We are developing prediction systems for RNA secondary structures thatcan handle pseudo-knots in an elegant manner, where the developing systems are constructed based on the following two ways.

View full abstract

Download PDF (170K)
Modeling Proteins Conformation in Solution. Part II

A Solvent Effect Model Based on the Evaluation of Solvent-Accessible Surface Area and Generalized Born Equation

Valentin Gogonea, Camelia Baleanu-Gogonea, Carlos A. Del Carpio

1996 Volume 7 Pages 224-225
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.224

JOURNAL FREE ACCESS

Show abstractHide abstract

This is the second of a series of articles describing our system for prediction of protein con-formation in solution. Here we propose a force field for studyingprotein folding in solution. Our force field is made up of an internal force field (MM2) and a solvent force field which sums up the constrains that solvent imposes to protein structure in solution, as compared with the gas phase.

View full abstract

Download PDF (219K)
Sense-Antisense Homology Boxes in Proteins

Structural Motifs Enconded in the DNA?

Carlos A. Del Carpio, Valentin Gogonea, Katsuhisa Yamaguchi, Makoto Ta ...

1996 Volume 7 Pages 226-227
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.226

JOURNAL FREE ACCESS

Show abstractHide abstract

Experimental evidence implying that complementary DNA strands encode amino acidswhich exhibit complementary hydrophobic characteristics has led us to the inspection of sense-antisense homology in several hundreds of proteins recorded in the PDB. We present here partial results of this analysis which relate localized peculiar structural characteristics of proteins to the senseantisense homology boxes found in the primary sequences. A further analysis is performed in order to determine whether these sense-antisense homology boxes, if existent within the protein, are encoded by unique sequences of codons in the DNA. We also make here a progress report about the methodology and the results obtained so far.

View full abstract

Download PDF (216K)
Construction of the Bacillus subtilis ORF database (BSORF DB)

A. Ogiwara, N. Ogasawara, M. Watanabe, T. Takagi

1996 Volume 7 Pages 228-229
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.228

JOURNAL FREE ACCESS

Download PDF (2581K)
Clustering and Evolutional Analysis of E. coli Proteins

Sivasundaram Suharnan, Takeshi Itoh, Hidemi Watanabe, Jun-ichi Takeda, ...

1996 Volume 7 Pages 230-231
Published: 1996
Released on J-STAGE: July 11, 2011

DOIhttps://doi.org/10.11234/gi1990.7.230

JOURNAL FREE ACCESS

Download PDF (181K)

Register with J-STAGE for free!