Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
4 巻
選択された号の論文の67件中1~50を表示しています
  • 阿久津 達也
    1993 年 4 巻 p. 1-9
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    In this paper, we consider the pattern matching problems for three dimensional protein structures. Especially, we consider the problems of substructure search and common substructure search. First, we show that the common substructure search problem amongst multiple protein structures is very difficult from a theoretical viewpoint of computational complexity. Next, we present two practical algorithms. One is named a least-squares hashing method and the other is named a dynamic matching method. In the least-squares hashing method, the hashing technique, which is well-known in computer science, is combined with a least-squares fitting technique. In the dynamic matching method, the dynamic programming technique, which is widely used for pattern matching of DNA and amino acid sequences, is combined with a least-squares fitting technique. These two methods have been applied to PDB (Protein Data Bank) data and shown to be effective.
  • MAKOTO HIROSAWA, REIKO TANAKA, MASATO ISHIKAWA
    1993 年 4 巻 p. 10-16
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    The representation of biological concepts in a knowldge base are important to a machine or a non-specialist of biology to understand and analyze genetic information. In our previous study, we studied the representation of biological knowledge and the representation of biological knowledge related to motif of protein with the goal of discovering new motifs.
    In this paper, firstly, the requirements for the representation of biological knowledge are listed. Then, solutions to these requirements are stated. Finally, representation of bioloigal knowledge on motif in the Deductive Object-Oriented Language, QUIXOTΣ, is shown. The knowledge base includes Prosite, a representative motif database, as the basis of the knowledge base.
  • 松田 秀雄
    1993 年 4 巻 p. 17-24
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper proposes a prototype system for querying genomic database based on data-parallel logic programming. The efficient access to genomic database is crucial, given enormous increase in sequence data. By using a logic programming language, the system allows a user to perform adaptable data retrieval to integrated data objects in a single declarative framework. In addition by utilizing data-parallel processing, it provides efficient access in a large amount of genomic data on distributed computing environment. We present its design principle and discuss the implementation of the database system.
  • 佐藤 賢二, 古市 恵美子, 久原 哲, 高木 利久
    1993 年 4 巻 p. 25-35
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We developed a deductive database system PACADE for analyzing three dimensional and secondary structures of protein. PACADE is equipped with a function to search for similar structures in proteins. Unlike other approaches based on calculation of the inter-atomic root mean square distance, this function is based on logic programming and source level rule rewriting techniques.
    We describe herein the result of searches for topologically similar structures and three dimensionally similar ones. A user of PACADE can select these two levels of similarities by adding/deleting prefixes.
  • Applications to Modeling RNA
    榊原 康文, Michael Brown, Rebecca C. Underwood, Saira I. Mian, David Hauss ...
    1993 年 4 巻 p. 36-45
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. These models capture the common primary and secondary structure of the sequences with a context-free grammar, much like those used to define the syntax of programming languages. SCFGs generalize the hidden Markov models used in related work on protein and DNA sequences. The novel aspect of this work is that the SCFGs developed here are learned automatically from initially unaligned and unfolded training sequences. To do this, a new generalization of the forward-backward algorithm, commonly used to train hidden Markov models, is introduced. This algorithm is based on tree grammars, and is more efficient than the inside-outside algorithm, which was previously proposed to train SCFGs. This method is tested on the family of transfer RNA (tRNA) sequences. The results show that the model is able to reliably discriminate tRNA sequences from other RNA sequences of similar length, that it can reliably determine the secondary structure of new tRNA sequences, and that it can produce accurate multiple alignments of large collections of tRNA sequences. The model is also extended to handle introns present in tRNA genes.
  • 馬見塚 拓
    1993 年 4 巻 p. 46-55
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a new method for representing a local region of a protein sequence as a proba-bilistic network. The method produces, from a large number of examples of a local region, a network which describes dependency relationships that exist among amino acid residues in the region. The network is constructed using the greedy-search algorithm based on the minimum description length (MDL) principle. In our experiments, we construct two probabilistic networks of two α-helix regions in globin family protein. Experimental results show that our method provides a visual aid to understanding inter-residue dependencies of those regions with probabilistic networks, and the networks capture several important features which are peculiar to those regions.
  • 藤原 由希子, 小長谷 明彦
    1993 年 4 巻 p. 56-64
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    In this paper, we study the application of HMM to the problem of representing protein sequences by a stochastic motif. A stochastic (protein) motif represents the portions of protein sequences that have a certain function or structure, where conditional probabilities are used to deal with the stochastic nature of the motif. We proposed the iterative duplication method for HMM network learning. HMMs are much more expressive than symbolic patterns and are better suited to represent the variety of protein sequences. As an experiment, we constructed HMMs for leucine zipper motif using 112 protein sequences as a training set, and obtained an accuracy of 79.3 percent in the prediction of protein sequences, compared for an accuracy 14.8 percent when using a symbolic representation. Our approach can be used also for the validation of protein databases; the automatically constructed HMM has indicated that one protein sequence annotated as “leucine-zipper like sequence” in the database is quite different from other leucine-zipper sequences in terms of likelihood.
  • 萩谷 昌己
    1993 年 4 巻 p. 65-73
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    The problem of constructing contigs by the STS strategy is a simple combinatorial problem if the given hit information is correct and complete. However, hit information is often incorrect or incomplete due to failure or inability of experiments. Moreover, in addition to hit information, various sources of information are also available, such as known landmarks, other clone libraries, etc. In order to cope with incompleteness, incorrectness and additional information, we developed a deductive method for constructing contigs. Contigs are constructed by deducing an equivalence relation of clone directions and a partial order among STS markers on each equivalence class of directions. In the paper, a practical algorithm based on the method is presented and its completeness is proved. The method is also axiomatized by a set of inference rules for deducing the equivalence relation and the partial orders. We finally discuss the problem of visualizing contigs based on the information deduced by our method.
  • 篠原 歩, 宮野 悟, 有川 節夫, 下薗 真一, 内田 智之, 久原 哲
    1993 年 4 巻 p. 74-83
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We have developed a machine learning system BONSAI which gets positive and negative examples as inputs and produces a pair of a decision tree over regular patterns and an alphabet indexing as a hypothesis. This paper proposes two applications of BONSAI when we can run multiple BONSAI systems in parallel.
    The one is to classify given examples which are coming from several different unknown classes. The process of solving the problem consists of multiply spawned BONSAI systems, each of which tries to find a decision tree, an alphabet indexing and a group of examples. It will finally partition a hodgepodge of sequences into a small number of disjoint classes together with hypotheses explaining these classes accurately.
    The other is to find a good sample of a concept. Though the main interest of applying the BONSAI system is to discover good hypotheses, it is equally interesting to find a small set of examples from which a good hypothesis is made. We present a method for solving this problem by combining a strategy in genetic algorithms with multiply running BONSAI systems.
  • MASATO ISHIKAWA, TOMOYUKI TOYA, YASUSHI TOTOKI, AKIHIKO KONAGAYA
    1993 年 4 巻 p. 84-93
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper proposes a new methodology to improve the performance of multiple sequence alignment by combining a genetic algorithm and an iterative alignment algorithm. Iterative alignment algorithms usually achieve better alignment than other alignment algorithms, such as tournament based multiple alignment. They, also, can incorporate parallelism to improve execution performance. However, they sometimes suffer from being trapped in the local optima and result in relatively low-quality alignments due to their rapid convergence. A genetic algorithm can save this problem by exchanging partial alignment sequences between “individuals”. Our experiments show that the combination of a genetic algorithm and an iterative alignment algorithm produces better results than iterative aligners which employ hill-climbing search strategies.
  • 荒木 志帆, 五島 正裕, 森 眞一郎, 中島 浩, 富田 眞治, 秋山 泰, 金久 實
    1993 年 4 巻 p. 94-102
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper makes two proposals to speed up the Parallel Iterative Method, which is based on the iterative strategy of the Berger-Munson algorithm.
    The first proposal is to exploit finer-grained parallelism in the DP (Dynamic Programming) procedure itself. This proposal makes the processing speed proportional to the number of processors.
    The second proposal is to apply the A* algorithm, a well known heuristic search algorithm, instead of DP. A* reduces the search space using heuristics, while DP traverses the whole space blindly.
    We have implemented these two proposals on a parallel computer, the AP1000. In a test of parallelizing DP, ten 1000-character sequences are aligned by using 10 processors per one DP procedure at a speed 8.11 times faster than sequential processing. By applying the A* algorithm to 30 sets of test problems, we obtain optimal alignment by reducing the search space by 95%.
  • Naoto Ukiyama, Hiroshi Imai
    1993 年 4 巻 p. 103-108
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper addresses several issues in parallel multiple alignments, and reports some preliminary computational results of their implementation on CM5. Use of parallelism in the diagonal direction is laid stress on, which is quite useful especially when aligning similar strings. Some connection with the parallel approximate string matching algorithm by Landau and Vishkin [1] is also touched upon.
  • 後藤 修
    1993 年 4 巻 p. 109-113
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Given a multiple sequence alignment of a family of protein or nucleotide sequences, conserved or highly variable regions are valuable landmarks to get insight into the functional and structural roles of individual regions. Conserved regions can also act as anchor points in the process of further improvement of the given alignment. Two different approaches were undertaken to extract conserved regions based on the principle of either consistency or high scores. The latter approach is easily modified to extract highly variable regions by reversing the scoring scheme. Examinations on a few protein families are discussed.
  • 瀬戸 保彦, 池内 義典, 磯山 正治
    1993 年 4 巻 p. 114-119
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Motifs are essential sites and therefor usually conserved in proteins. Motifs play a crucial role not only in protein world but also in genome projects. Their information are usually obtained by experiments and laborious multiple sequence alignment. Based on the fact that motifs are conserved short sequences, we developed method for extracting motifs automatically from pairwise sequence alignment. Moderately similar proteins for a probe protein are searched against all entries in sequence database. Motifs of a probe are then extracted from each pairwise alignment under the specified restrictions. We applied the method to 389 probe proteins from 89 superfamilies in PIR database and evaluated the extracted motifs.
  • Gen Shibayama, Hiroshi Imai
    1993 年 4 巻 p. 120-129
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Detecting similarities of multiple genome sequences is one of the most important topics in genome informatics. For the purpose of finding such similarities, an alignment with the highest score with respect to some similarity criterion is provided as an output. However, the alignment with the best score is not necessarily the most significant alignment of the sequences from the viewpoint of biology. In this respect, providing suboptimal alignments is very useful.
    Since finding an alignment of sequences corresponds to finding a path in some directed acyclic graph, we propose a simple algorithm to enumerate all K-best alignments in order, where K may not necessarily be specified beforehand, by finding the K longest paths in the graph. We further consider finding the subgraph formed by such K longest paths. Several useful approaches to find the optimal paths in a graph are also mentioned.
  • 浅井 潔, 田中 秀俊, 伊藤 克亘, 鬼塚 健太郎
    1993 年 4 巻 p. 130-139
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Hidden Markov Model (HMM) , a type of stochastic model (signal source), is now becoming popular in molecular biology. HMMs consist of ‘hidden’ states, statetransition probabilities and output distributions. Because there are known algorithms to train the HMMs as stochastic representations of the training data, they are widely used for pattern recognition, especially for speech recognition.
    In the field of protein research, HMMs have been used to represent stochastic motifs of protein sequences, to model the structural patterns of protein, to predict the secondary structures and upper level structures, to make multiple sequence alignments, and to classify the protein sequences.
    In each case, HMM techniques are closely related to the conventional methods. An important merit for using HMMs is their flexibility as a model of protein sequences. The serious problem of HMMs is that they need a large number of training data. In this paper, we give a brief introduction to HMMs, review HMM-related protein research, compare these research with the other methods and discuss the usefulness and further possibilities of HMMs.
  • 鬼塚 健太郎, KIYOSHI ASAI, MASATO ISHIKAWA
    1993 年 4 巻 p. 140-151
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a novel scheme for protein 3D structure prediction using the Multi-level Description scheme (MLD). In this prediction scheme, a local conformation is not only determined by the primary structure at that region (i. e., primary constraints) but is also constrained by the neighboring or surrounding local conformations (i. e., geometric constraints).
    The MLD describes a protein conformation with multiple levels of different scales and degrees of abstraction. This scheme facilitate to model the geometric constraints between the neighboring local conformations by analyzing the frequency of overlapping patterns of the local conformations. The primary constraints are modeled by analyzing the relationship between the primary structure and the local conformation at that region.
    The MLD representing a real protein conformation must satisfy most of the constraints above. Thus. a vrotein conformation can be predicted by searching for the optimal MLD that bset satisfies the constraints. This problems is formulated as a combinatorial optimization problem.
  • II. Tertiary Structure
    Nobuhiko Saitô, Motonori Ota
    1993 年 4 巻 p. 152-156
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    The packing mechanism of the secondary structures has been revealed. The driving force is the hydrophobic interaction between hydrophobic residues which are located at nearest distance along the chain. They are chosen because they can be bound most quickly. In this way local structures of the protein are determined and thus glow into the whole structure. This process is usually done manually, but is now tried to be carried out automatically. This formulation is applied to crambin.
  • Makiko SUWA, Takatsugu HIROKAWA, Shigeki MITAKU
    1993 年 4 巻 p. 157-166
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    A theoretical method for structure prediction of membrane proteins was developed based upon physicochemical calculations, comprised of three steps. In the first step, the polar interaction field of a transmembrane helix was characterized by a probe helix method in which interaction energy between a transmembrane helix and a probe helix was calculated. A jigsaw puzzle problem in the second step was solved by using a binding maps of pairs of helices. Binding energy obtained from the polar interaction field was plotted in a binding map as functions of the orientation angles of the two helices. Finally, helix configuration determined by the analysis of binding maps was refined, minimizing the binding function of the whole system.
    In order to deal with a jigsaw puzzle problem, several principles of the folding of membrane proteins have been assumed:(1) The molecular structure is formed according to some folding pathway.(2) The dominant interaction in hydrophobic region of membrane is the polar interaction.(3) Transmembrane helix can be regarded as a stable rod with charge distribution on it. The comparison of the predicted structure of bacteriorhodopsin with the experimental one revealed that the reconstruction of the relative position and the orientation of transmembrane helices is possible by this method. Applying this method to rhodopsin, the configuration of transmembrane helices was determined, which was quite similar to the experimental configuration of transmembrane helices. The mechanism of the structural change of rhodopsin by cis-trans isomerization of retinal was suggested from the predicted structure.
  • 大久保 善明, 金久 實
    1993 年 4 巻 p. 167-174
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    In order to predict protein structures from their primary sequences, the understanding of long-range interactions is one of the most critical points. We are dealing with this problem by focusing on the pairs of peptide segments which are separated in the primary sequence but are close in the three-dimensional structure. The method is applied to a set of structure-resolved proteins to see if there are any significant features for association of local structures, such as secondary structure segments. The dataset consists of 88 nonhomologous proteins selected from the Brookhaven Protein Data Bank (PDB) using the superfamily classification of the Protein Information Resource (PIR). In the method, given the definition of the distance between two segments, spatially close segment-pairs are extracted for Ca segments of 4 or 7 residues long. The result shows that there are no preferred distances for association of two helical segments but there is a minimum of twenty intervening residues required for parallel helical segments.
  • 飯田 陽一, 増田 剛
    1993 年 4 巻 p. 175-182
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Concerning the translation initiation signals in vertebrate mRNAs, not only ATG initiation codon but also sequences flanking the initiation codon are required to direct the position of initiation. A consensus sequence for the signal, (GCC) GCCGCCATGG, has been proposed by Kozak, but actual initiation sequences differ from it in a greater or lesser degree. In the present report, the translation initiation signal sequences of human β-globin and β-thalassemia mRNAs were analyzed using a quantification method proposed previously. In this method, each 16-nucleotide sequence in the mRNA was charactarized by its sample score, which shows intensity of the signal. Scoring of signal sequences could explain not only the authentic initiation site but also the experimental results of various mutations which took place around the initiation site. Further analysis demonstrated that, in addition to the signal intensity, the sequence nearest the cap site was preferred. This supported Kozak's scanning hypothesis, in which the eukaryotic small ribosomal subunit binds initially at the 5'-end of mRNA and subsequently migrates to the signal sequence.
  • 田嶋 耕治
    1993 年 4 巻 p. 183-187
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a more sensitive algorithm for multiple sequence alignment using parallel genetic algorithms. With less computation than that needed for multi-dimensional dynamic programming approaches, we can obtain multiple alignments which have better similarity than that obtained by repeating two-dimensional dynamic programming. The parallel processing of genetic algorithms was performed on a Fujitsu parallel computer AP1000.
  • Tsuyoshi Yoshizawa, Masaki Fumoto, Tamio Yasukawa
    1993 年 4 巻 p. 188-196
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    A spin glass model for polypeptide chains consisting of 4 states a, b, c1 and c2, was introduced for the energy minimal conformation search by an extended Hopfield algorithm, in which energy dissipation rate was gradually reduced to simulate annealing processes. Inter-residue interaction energies were estimated by molecular mechanics program AMBER using model oligopeptide chains and crystal structure data. Preliminary results obtained with BPTI are not so satisfactory and several measures to improve the prediction accuracy were discussed.
  • 笹川 文義, 田嶋 耕治
    1993 年 4 巻 p. 197-204
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Usually, the prediction of protein secondary structure by a neural network is based on three states (α-helix, β-sheet and coil). However, a recent report of protein of which structure is determined presents more detailed secondary structure as 310-helix. It is expected that more detailed secondary structure of protein should be predicted. In application of neural network to the prediction of multi-states secondary structures, some problematic points are discussed. The prediction of globular protein secondary structures is studied by a neural network. The application of a neural network with a modular architecture to prediction of protein secondary structures (α-helix, β-sheet and coil) is presented. Each module is a three layer neural network. The results from the neural network with a modular architecture and with a simple three layer structure are compared. Overlearning effect is investigated in ordinary and modular neural networks. The prediction accuracy by a neural network with a modular architecture is higher than of the ordinary neural network. The 3, 4 and 8 state classification scheme of secondary structures are considered in the ordinary three layer neural network. The percentage of correct prediction depends on these state classification method. Furthermore, for 3 and 4 state classification scheme of protein secondary structures, the consistencey of outputs of modules on the neural network with modular architecture is investigated.
  • 中田 琴子
    1993 年 4 巻 p. 205-210
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Using the neural network algorithm with back-propagation traing procedure, we analysed the zinc finger DNA binding protein sequences. The patterns which were used in the neural network are amino acids sequence pattern, the electric charge and polarity, amino acids group properties, amino acids ancestral group, hydrophobicity, hydrophilicity and the secondary structure. For the comparison, th e discriminant analysis was also tried. As for the TFIIIA type (Cys-X2-4-Cys-X12-15-His-X3-5-His)(X is any amino acid) zinc finger DNA binding motifs, the prediction results reached high discrimination in the neural network algorithm and the discriminant analysis. Although each result of single perceptron algorithm is not always good in the case of the estrogen type (Cys-X2-4-Cys-X12-15-Cys-X2-4-Cys) zinc finge, the combination of the attributes reached high discrimination.
  • Motif Evaluation on a 3-D Structure
    Kazuhiro Iida, Hiroshi Mamitsuka
    1993 年 4 巻 p. 211-218
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    A probabilistic logic neural network, mSDN reveals multiple biochemical rules hidden in a protein amino-acid sequence. Two motifs are extracted from a 16-residue hemoglobin α-helix region. The motifs each containing only 3 amino-acid residues, correctly classify new data with 96% accuracy. Evaluating the motifs on a hemoglobin 3-D structure suggests that one motif represents a local α-helix determiner, and the other explains long-range interactions which are important for hemoglobin tertiary structure. The findings indicate that the mSDN extracts region specific and biochemically significant motifs from an amino-acid sequence, and suggest that the network separates heterogeneous biochemical rules in a sequence into corresponding motifs. Motifs extracted by the mSDN will help us to analyze, and to predict protein conformations and its functions.
  • 新島 耕一, 下薗 真一
    1993 年 4 巻 p. 219-223
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Domains of classifying positive and negative patterns are derived by imposing some heteroassociative output conditions on the network. Using the shape of the domain, a functional to be minimized is introduced to determine connection weights and threshold values of the network. Minimization techniques of the functional, which give learning algorithms of the network, are also discussed. In the last, remarks on numerical experiments are described.
  • 田中 秀俊, 鬼塚 健太郎, 浅井 潔
    1993 年 4 巻 p. 224-230
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Hidden Markov Model (HMM) introduces a stochastic approach to protein representation and motif abstraction. We need the stochastic classification which is seamless with HMM representation and abstraction. Successive State Splitting (SSS) classifies proteins represented by HMM. It uses no previous knowledge of the proteins. The SSS algorithm was originally developed for allophone modeling. It is based on continuous distribution of phenome data. It enables to obtain an appropriate Hidden Markov Network automatically, and HMM simultaneously. We map amino acids onto continuous space according to quantification based on PAM-250.
  • Shigehiko Kanaya, Yoshihiro Kudo
    1993 年 4 巻 p. 231-238
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    In order to examine differences of preferential usage of synonymous codons among species systematically, principal component analysis is applied to a matrix consisting of relative frequencies in synonymous codons. The first two principal components (PC1 and PC2) account for 66% and 8%, respectively. From the PC projection by the first two components, the following conclusion can be obtained:(1) The base-preference of A and U (G and C) at the third position in synonymous codon contributes negatively (positively) to the PC1: Vertebrates and chloroplasts are clusterized in narrow regions with positive and the most negative PC1, respectively.(2) The PC2 is important to distinguish between prokaryotes and (eukaryotes: Eukaryotes prokaryotes) prefer di-nucleotides GA, AG, CU and CA (CG, GC, and AA) at the second and the third positions in codons.
  • 楠田 潤, 平田 誠, 豊田 敦, 高橋 一朗, 橋本 雄之
    1993 年 4 巻 p. 239-244
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    To estimate the frequency in the association of CpG islands with genes distributed in human genome, we have screened the statistically expected CpG islands for sequenced human DNAs compiled in DNA database. The survey of 2605 genomic sequences (>300 bp) coding 833 genes mapped on human chromosomes identified 1030 CpG island-linked sequences classified to 324 genes, indicating that at least 39% of human genes are coincided with CpG islands. Furthermore, it is found that 19%, 36% and 45% of CpG islands mapped on single chromosomal bands are located on G-, R- and T-bands. This result suggests that the occurrence of CpG island-genes increases with increasing the global G+C% level of chromosomal bands.
  • 須山 幹太, 西岡 孝明, 小田 順一
    1993 年 4 巻 p. 245-254
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    To extract the ligand-related motifs from the sequences of enzymes, we have constructed Ligand Chemical Database for Enzyme Reaction that links a chemical compound to amino acid sequences. Among 1, 966 ligands registered, 519 chemical compounds were related to 1, 488 ligand-linked sequences. Sequence fragments of 10-residue long, commonly found among the ligand-linked sequences for each chemical compound, were defined as ligand-related motifs. Motifs extracted for pyridoxal phosphate were tested against the crystal structures of aspartate aminotransferase complexed with pyridoxal phosphate. Twenty-four motifs among 93 motifs extracted from the enzyme include the residues that make chemical interactions with the bound pyridoxal phosphate. One of the motifs, K-x-x-G-L-x-x-x-R-V, actually participates in the recognition of pyridoxal phosphate in another enzyme, 1-aminocyclopropanel--carboxylate synthase. The present approach provides the ligand-related motifs and shows great potentials to characterize the unknown genes sequenced by the genome project.
  • 内山 郁夫, 荻原 淳, 大久保 善明, 金久 實
    1993 年 4 巻 p. 255-263
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    A method is described for extracting signature pentapeptides that are conserved and exclusively found in a group of homologous proteins. The BLAST algorithm is used to count the frequency of occurrences of pentapeptide patterns allowing limited substitutions, as well as to perform homology search. For those pentapeptides that appear in a given sequence we examine the frequency of occurrences of these pentapeptides and related ones in homologous sequences which are ordered according to the homology score. By comparing against the frequency in the entire database, we can extract uniquely conserved pentapeptides and at the same time perform a grouping of homologous sequences. Thus, our procedure can automatically identify, if any, pentapeptides that are strongly tied with the group. Possibility of using our pentapeptide word dictionary to infer protein function is discussed.
  • Keiichi Nagai, Tetsuo Nishikawa, Hideki Kambara, Toshihisa Takagi
    1993 年 4 巻 p. 264-269
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Conventional database search programs for finding local similarities in protein and DNA sequences, such as the one based on the Smith-Waterman algorithm, FASTA, and BLAST, can contain subregions having high similarity, low similarity, and even no similarity. We propose a simple method for finding significant local sequence similarity regions, where the alignment results of two sequences are graphed as integrated scores calculated along the aligned sequences using the match, mismatch, and gap penalty scores. This method has been used to find local similarity subregions in alignment results obtained by BLAST or the Smith-Waterman algorithm. Potential applications for finding domain structures and the characteristic sequence patterns are also shown.
  • Hiroyuki Ogata, Yutaka Akiyama, Minoru Kanehisa
    1993 年 4 巻 p. 270-274
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We are developing a computational method for automatically organizing collections of structural knowledge of RNA into a three-dimensional (3-D) form. The goal of our method for modeling of RNAestructure is to find, ase much as possible, conformations of RNA which satisfy the constraints frome experiments and sequence analysis and, at the same time, whose local conformations are close to some representative conformations. For efficient conformational search, we used a genetic algorithm as a trial. We applyed our method in modeling a single stranded region of an RNA for the estimation of efficiency of our method.
  • 藤渕 航, 金久 實
    1993 年 4 巻 p. 275-282
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We constructed a dictionary of sequence motifs for transcription regulation with a heuristic method from a set of DNA sequences upstream of the transcription initiation site. The method first identifies wealdy conserved blocks within a given region relative to the initiation site by the search and merge of six-base patterns. Then most conserved portions of these blocks are extracted by calculating the information content after similar blocks are multiply aligned. The procedure was applied to primate promoters and the result was evaluated with the Transcription Factor Database (TFD). The result will give us new biological insights into the DNA signals.
  • 横森 貴, 小林 聡
    1993 年 4 巻 p. 283-292
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a simple string similarity measure and apply it to the problem of DNA sequence analysis, more specifically, to the problem of analysing molecular evolution. This measure is based on a “local feature” that was motivated from a theoretical characterization on DNA splicing sequences.
    We demonstrate the usefulness of the proposed measure by presenting an experimental result which concerns evolutionary molecular analysis. This sheds new light on the other types of DNA sequence analysis such as protein classification, motif identification.
  • 小林 幸夫, 斎藤 信彦
    1993 年 4 巻 p. 293-299
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Statistical mechanical method is proposed to predict the secondary structures of globular proteins. Three-state prediction which provides simultaneously the probabilities of α-helix, β-strand and coil is performed with a recurrence method. The probabilities of the ith residue in a-helix or in β-strand are calculated with statistical weights for amino acid pairs in a-helix or in β-strand. We determine the statistical weights to yield the correct predictions for the proteins with known structures instead of calculating directly the interaction energies between residues. To do this, we introduce an objective function and estimate the weights so as to minimize this function by referring to the proteins for optimization. This method yields prediction accuracy of 67% for 13 proteins for accuracy estimation. This value does not exceed the best values obtained by the method based on homology. However, we have a hope to improve the accuracy, since we can analyze the reasons for poor accuracy in contrast to other methods.
  • Khawaja Sirajuddin, Tomomasa Nagashima, Koichi Ono
    1993 年 4 巻 p. 300-305
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    The consensus sequence for 5'-splice site has been proposed as CAG/GTGAGT. But the actual splice site sequence differs from it at a certain extent more or less. In this paper we analyze various mammalian globin genes using the induction of decision tree. We have found that the prediction rate for discriminating unknown sequences increases in accordance with the increase of the rate of false splice site sequences with dinucleotide GT at 4th and 5th position in the learning data set.
  • 古谷 博史
    1993 年 4 巻 p. 306-314
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We have developed networks with back-propagation learning algorithm for the prediction of splice sites in mRNA precursors. We used these networks to predict the effects of mutations on splicing of protein coding genes. We applied neural networks to β-thalassemia genes (mutant β-globin genes), a hemophilia B gene (mutant blood coagulation factor IX gene) and a mutant c-Ha-ras oncogene. We demonstrate that these networks predict abnormal splicing patterns in these genes consistent with experiments.
  • 神村 基和, 高橋 由雅
    1993 年 4 巻 p. 315-324
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    In this paper we aim to examine in detail the data distribution within each conformational pattem class and to identify some local common structural features among the fragments in a particular cluster (or subcluster). Backbone conformational pattern clustering was carried out for the three-dimensional peptide fragments where the Φ-ψ, conformational pattern of the TA (target amino acid) belongs to class A (α-helix dominant class) or β(n-sheet dominant class) as defined in our previous work. The analysis for the fragments of class A suggested that these fragments involve four representative local backbone conformational patterns, not only for typical α-helix fragments but also for fragments closely related to type I turn or the starting moieties of α-helices. On the other hand, the analysis for class B fragments showed that these have much more diversity than class A fragments with respect to their local backbone structures. The details of the methods and results of the analyses are discussed here.
  • Koji Ohnishi
    1993 年 4 巻 p. 325-331
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    The Bacillus subtlis trrnD operon has a structure of 5'[16S rRNA-23S rRNA-5S rRNA-(RNA) 16] 3'. The tRNA duster in this operon includes 16 tandemly repeated tRNA genes (denoted by “poly-tRNA structure”), in which ordering of amino acid (aa) specificities of these tRNA is “NSEVMD FT YWHQ GCLL”. An ancient “trrnD -peptide” possessing this aa sequence was hypothesized, and protein sequence regions similar to tanD-peptide were searched for from PIR Proein Sequence Database. The aa's 139-156 in the E. coli Gly-tRNA synthetase (GIyRS) a subunit was found to be most similar to this peptide.
    Further analysis revealed that not only the GIyRS gene encoding GIyRS α, but also the a gene of Synechococcus 6301 encoding F0-ATPase a subunit, are both true homologues of the BSU trrnD poly-tRNA region. These findings strongly support the recently proposed “poly-tRNA theory”(Ohnishi, 1993) on the origin of mRNA and genetic codes. Thus it has now been concluded that the trrnD polytRNA region is a relic of aost primitive RNA molecule capable of synthesizing a trrnD-peptide-like primitive peptide in early life. The most paradoxical problem on the origin of genetic codes seems to have been basically solved from the aspect of poly-tRNA theory.
  • Tsukasa Sakai
    1993 年 4 巻 p. 332-338
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Substitution odds r (i, j), for amino acid residues, can be transformed to similarities s (i, j) by normalizing with geometric average of conservative odds r (i, i) and r (j, j). Similarities thus derived for all twenty natural amino acid residues in proteins, conform to the range 0 to 1, and have complementary dissimilarities. Empirical test has qualified that the dissimilarity satisfies all metric requirements as distance between residues. Relative certainty, as identity index, calculated from both similarity and dissimilarity, can be used as matching scores, consistent with both of them, in protein sequence comparison.
  • Takashi Ishikawa, Takao Terano
    1993 年 4 巻 p. 339-346
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper describes a computational method to predict a protein structure by analogical reasoning from known protein structures. The proposed method: Analogy by Abstraction uses heuristics to reduce the search complexity to get appropriate transformations to create a structure of the unknown protein form a known protein structure. We implement an algorithm of the method in Prolog programing language, and exemplify its effectiveness by re-predicting the structure of ‘Zinc fingers’ from its amino-acid sequence.
  • K. Wada, Y. Wada, S. Tanaka, H. Doi, Y. Nakamura, K. Sugaya, T. Fukaga ...
    1993 年 4 巻 p. 347-351
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
  • 五斗 進, 高木 利久, 坂本 憲広
    1993 年 4 巻 p. 352-361
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Recently, many genome databanks were developed as a result of growing genome project activities. Each of them consists of a large amount and variety of data, and they were developed independently. Therefore, their integration and efficient management of the data are required. It is also necessary to develop a framework for easily building and testing biological hypotheses with the integrated database. We developed a deductive objectoriented database for searching an integrated database, acquiring new knowledge from it, and storing the knowledge in the database. It consists of an object-oriented database that integrates the conventional genome databases such as GenBank, and deductive language interface for genome analysis. In this paper, we present an overview of the system and examples of analyses using the database.
  • Takahiko SUZUKI, Susumu NAKASHIMA, Toshihisa TAKAGI, Satoru KUHARA, Mi ...
    1993 年 4 巻 p. 362-369
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    An integrated database system “HyperGenome” for genome maps and DNA sequences was developed. The system can handle two different types of data, each of which has an unique complex structure. Graphical user interface (GUI) enables ready retrieval of information obtained from genome mapping data and data on DNA sequences. Data on mapping are derived from the Genome Data Base (GDB) and sequence data are from GenBank.
    The following information was added to the system. 1. Mendelian Inheritance in Man (MIM) entries can be linked to a locus in our system. 2. Amino asid sequences from Protein Identification Resources (PIR) can be displayed, in conjunction with the nucleotide sequence.
  • 蓑島 伸生, 清水 信義
    1993 年 4 巻 p. 370-375
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    We developed a new database system, Locus-in, to enter raw mapping data and construct integrated maps. This system works on Sun workstation with X-window and a graphic library, Motif. The system supports full graphical user interface. It has the following unique functions:(1) to zoom-in on a specific region of interest;(2) to generate a number of sub-windows associated with a specific region for entry and display of data (each subwindow accepts either ordered or not ordered and either raw or published data); and (3) to create new breakpoints. The current version of Locus-in will be demonstrated at the workshop.
  • 陶山 明, 萩谷 昌己, 伊藤 隆司, 藤山 秋佐夫, 大山 彰, 高木 利久
    1993 年 4 巻 p. 376-384
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    ContigMaker is a software tool to aid contig map construction. It is a Motif application running on UNIX workstations with the X Window System. ContigMaker is composed of five major components: map data manager, map analyzer, map viewer, map aid, and project manager. Contig-mapping data obtained by experiments are stored in a database of the map data manager. The stored data are then subjected to analysis by the map analyzer to generate contigs. ContigMaker supports the two strategies for contig construction: the STS (sequence-tagged sites) strategy and the MOF (mapping by oligonucleotide fingerprinting) strategy. The generated contigs are assembled into a contig map according to positions of landmarks falling on the contigs. ContigMaker allows a user to extract landmark information from a public genome database such as the GDB. The contig maps constructed are graphically drawn by the map viewer. The map aid provides miscellaneous small useful tools to finish a contig-mapping task. A repeated task ContigMaker performs can be automated by a macro created by the project manager. The macro will save time and effort for contig map construction.
  • Toshiyuki Niiyama, Takeo Tokimori, Atsushi Ogiwara, Ikuo Uchiyama, Ken ...
    1993 年 4 巻 p. 385-393
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    GNOME is a sequence data management tool through which users can efficiently access e-mail servers for various molecular biological analyses on Internet including GenomeNet. It supports BLAST/FASTA servers for homology searches, PROSITE/MotifDic servers for motif searches, and bget/bfind servers for DB entry retrievals. One of its most eminent features is that it can not only send e-mails for queries but also receive and manage e-mails for replies. In addition, its interface is very user-friendly. Therefore, it should considerably enhance efficient and profound analyses of newly-determined sequence data in both individual biological researches and large-scale genome projects
  • 秋山 泰, 森 浩禎, 久原 哲, 小笠原 直毅, 宮嶋 伸行, 古川 哲也, 佐藤 賢二, 村上 康文
    1993 年 4 巻 p. 394-401
    発行日: 1993年
    公開日: 2011/07/11
    ジャーナル フリー
    Genomatica is an integrated software tool designed for helping systematic management of a large number of DNA sequence fragments obtained through a genome sequencing project.
    Its graphic user-interface also allows users to look, with any magnifying factor, into any position of the specified chromosome and to browse various kinds of collected information altogether (including: DNA sequence itself, related gene descriptions, bibliographic references, corresponding GenBank entries, confirmed or putative coding regions, results from homology analysis for the expected protein, RNA genes, clone information, enzyme restriction maps, comments from administrator, private memorandums by user).
    We are planning to use Genomatica in E. coli (local data compilation mainly managed by Mori),
    B. subtilis (by Ogasawara), and S. cerevisiae (by Murakami) genome sequencing projects.
    The Genomatica project was started on 1992 as one of the advanced genome database projects sponsored by Human Genome Center, University of Tokyo. In June 1993, ver. 2.0 which was fully re-designed with NCBI vibrant library was released. Further augmented version Genomatica 2.1 (with several sequence analysis functions and network communication modules) will be released on Nov. 1993 and will be distributed through anonymous ftp services. The Genomatica system is currently available for X11 window system on Unix workstations, but Macintosh and IBM-PC versions will be also announced soon.
feedback
Top