Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Volume 3
Displaying 1-27 of 27 articles from this issue
  • 1992Volume 3 Pages 17-20
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Download PDF (77K)
  • Hiroshi MIZUSHIMA, Yoshiyuki KUCHINO
    1992Volume 3 Pages 21-24
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    National Cancer Center Research Institute has about 200 researchers. Most of them are engaged in studies of molecular biology. We established a Local Area Network System in our institute and it has been running since April 1992. This system is connected to more than 250 different instruments including 53 NEC-PCs, 122 Macintosh computers (Mac), 5 UNIX workstations. We adapted Netware 386 v3.11 on the file server which is comunicating both with PCs and Macs. The CD-server contains 14 CD-ROMs including MEDLINE and DNA databases. E-mails and NetNews service system are administered by the UNIX system which was connected to Internet on October 1992.
    Generally, the researchers have used microcomputers and CD-ROMs for analysis of their research data. A few reserchers have had an opportunity to use UNIX and/or VAX systems by using modem. However the system are not useful for many beginners. So, we attempted to develop the genetic information analysis system using CD-server on the network system. We wish to establish a system running on PCs and Macs which are able to automatically take in up-to-date databases collected in UNIX through Internet.
    Download PDF (304K)
  • Hajime Kitakami, Yukiko Yamazaki, Yoshio Ugawa, Kazuo Ikeo, Naruya Sai ...
    1992Volume 3 Pages 25-28
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The DNA database has been managed in a flat-file system at the DDBJ since 1985. The flat-file system is inadequate for building and searching the DNA database which is receiving an explosive increase entries. We carried out a transformation from the flat-file system to the relational database system with GenBank staff. The schema of the relational database was designed as follows:
    (1) Decomposing the DNA data into both structuralized and non-structuralized data
    (2) Partitioning large tables into small tables without update anomaly
    (3) Making a flexible relationship among tables to represent complex data
    This schema provided the capability for building and searching the DNA database with less memory on the relational database system. However, the schema was implemented as a complex network structure with about 60 tables. It is difficult to use the SQL search language of the relational database system with this schema.
    We defined and simplified the schema for easy use of the commands using the view function of the relational database system on the existing schema. The simplified schema implemented in the view function was defined as LOCUS, DEFINITION, ACCESSION, KEYWORDS, SOURCE, REFERENCE, FEATURES, ORIGIN, and SEQUENCE tables which are virtual tables without storing real data. It represents aspects of the traditional DDBJ/EMBL/GenBank data format which are familiar to biologists using the flat-file system.
    Users can easily join these virtual tables using attribute storing accession numbers. Since we developed the simplified schema, users are able to use the SQL search command easily and get quick response in DNA data searches.
    Download PDF (2490K)
  • T. Ikemura, K. Sugaya, K. Wada
    1992Volume 3 Pages 29-32
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    It has become clear that the human genome is compartmented as a mosaic of very long DNA sequences, each of which is fairly homogeneous in its G+C content with several different levels. The aim of the present work was to characterize and visualize the G+C% distribution in different domains of the mosaic structures of the human genome. As a sensitive tool to visualize both the long-range and the short-range features of G+C% variation along the genome sequences, we proposed the analysis of the frequency distribution of the dinucleotides that are composed only of G and /or C, or of A and/or T, as well as the distribution of their differences.
    Download PDF (316K)
  • Comparison of Repressibility of Promoter-like sequneces among Escherichia coil and its phages (P4 and λ)
    Shigehiko KANAYA, Yoshihiro KUDO
    1992Volume 3 Pages 33-36
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    By comparing, among Escherichia coli and 9 bacteriophages, repressibilities of occurrences of σ78-promoter consensus-like sequences in coding regions by means of an index (Ks) derived from x2-statistic and principal components analysis (PCA), we have demonstrated that (1-1) the repressibility of the occurrences of the consensuslike sequences in the whole genome for E.coli is the highest of all the genomes, (1-2) the repressibilities for the two temperate phages (P4 and λ) are similar to that for E. coli, and (2) the occcurrences of the consensus-like sequneces for other phages (T7, M13, fd, fl, IKe, G4, and ΦX174). These results suggest that the repressibilty of the consensus-like sequences in genomic coding regions is an important factor to effectively coordinate a web of gene control circuits, and assessment of similarities of phages to E.coli by the gene frequencies based on the two parameters (Ks and x2) and PCA has made it possible to predict whether or not phage genomes can be coordinated with E. coli genome in transcription regulation.
    Download PDF (314K)
  • Yôichi IIDA
    1992Volume 3 Pages 37-40
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Translation initiation signal in mRNA has been usually given by the AUG intiation codon. However, longer nucleotide sequence including not only the AUG codon but also 5'-untranslated sequence is found to determine the position of initiation. Mathematical approach of quantification analysis was made with nucleotide sequences of translation initiation signal in 698 vertebrate mRNAs, which were compiled by Kozak. Our approach was then applied to human α-globin mRNA, where the authentic signal sequence at position (37/38) was found to possess the strongest signal. This can explain the reason why its position was selected as the actual initiation site in the α-globin mRNA. In this mRNA, a mutant of α-thalassemia has been reported where dinucleotide deletion (AC) took place at position (-3/-2) from the AUG codon in the 5'-untranslated region. Experimental results show that this deletion causes significant decrease in the efficiency of translation initiation. Such a behaviour was well explained by our analysis.
    Download PDF (360K)
  • Koji Ohnishi
    1992Volume 3 Pages 41-44
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Prokaryotic tRNATrp and tRNAPhe and some other tRNAs seem to well conserve primitive tRNA (proto-tRNA) commonly ancestral to all contemporary tRNAs. The 5'-and 3'-halves of the E. coli (EC) tRNATrp and the Bacillus svbtilis (BSU) tRNAPhe were aligned with one another, resulting in a finding that proto-tRNA first emerged via duplication of ca. 38-base-long semi-tRNA which had (5') CCA (3') in its 3'-terminus. The peptidyltransferase (PT) region (bases 2469-2589) of 23S rRNA was found to be a close homologue of contemporary 5S rRNA (EC) showing a 54.4% base-match and a matching probability by chance, Pnuc(62, 114)=0.21×10-10. This well coinsides with Noller et al's (1992) recent experimental result that large-subunit rRNA has a PT activity. U1 snRNA and some possibly primitive mRNAs also showed significant levels of base sequence similarity, suggesting that some snRNAs and primitive mRNAs most probably emerged from tRNA-like or semi-tRNA-like primitive RNA (s).
    Download PDF (467K)
  • Yoshisato TAKEDA, Kenji YAMAMOTO, Masahiro YASUGI, Akinori YONEZAWA
    1992Volume 3 Pages 45-48
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    An RNA secondary structure prediction method using a highly parallel computer is reported. We focus on finding thermodynamically stable structures of a single-stranded RNA molecule. Our approach is based on a parallel combinatorial method which calculates the free energy of a molecule as the sum of the free energies of all the possible hydrogen bonds. Most of the conventional prediction methods find only the most stable structure. In contrast, our parallel algorithm finds many highly stable structures at once. The important idea in our algorithm is search-order scheduling. This scheduling helps us find highly stable RNA structures faster, and reduce unnecessary computation.
    Download PDF (495K)
  • Kenta NAKAI, Yutaka AKIYAMA, Hiroshi SAKAMOTO
    1992Volume 3 Pages 49-52
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    One of the most serious problems in predicting mature mRNA sequences from their precursor form is that there are so many false positive consensus sequence patterns of exon/intron and intron/exon boundaries. Are there any additional sequence information which are recognized by spliceosomes but have been missed by us? To investigate this, we constructed an aberrant splicing database. From that database, various interesting observations were made:(1) Most mutations worked for either destroying or creating the consensus patterns.(2) Mutations were observed much more frequently in 5' boundaries than in 3' boundaries.(3) Exon skippings were most commonly observed.(4) The selection of cryptic sites seem to be determined from the consensus score and perhaps from exon lengths.(5) Newly-created consensus sequences seem to be used only if it is ‘appropriately’ located. These observations will be hopefully used as rules for constructing a more effective prediction system of exon sequences.
    Download PDF (488K)
  • Koji Hakata, Hiroshi Imai
    1992Volume 3 Pages 53-56
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The algorithms for computing an LCS between two strings were given by many papers, but thereis no efficient algorithm for computing an LCS between more than two strings. This paper proposes a method for computing efficiently the LCS between three or more strings.
    Download PDF (446K)
  • Minoru ICCHODA, Nobuo TAKIGUCHI, Yoshiyuki KOTANI
    1992Volume 3 Pages 57-60
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Ordinary DP matching method deals only with simple strings. Here we propose a new algorithm of DP matching, which is extended for regular expression or wild cards, and contains strings of non-deterministic elements. It receives a pair of symbol string expressions, and generates their matching information. It contains calculation of matching weight which indicates matching degree of each pair of symbols, including wild cards. It also generates a new symbol string description for further matching process.
    Download PDF (4827K)
  • Timothy Burcham, James Candlin, Alan Roter
    1992Volume 3 Pages 61-64
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    INHERITTMis a sequence analysis and assembly package that utilizes the Fast Data Finder (FDF) in a number of unique ways. The FDF is a linear-systolic array designed to search flat text files very fast. INHERIT employs the FDF in database searching, resulting in search times limited only by disk-transfer rates. INHERIT also has a flexible pattern specification language that allows the expression of very complex biological patterns. These patterns can describe genetic motifs that can be used in database searches. Because of its design, the FDF can search the database for these patterns independent of pattern complexity. The FDF is used in sequence assembly by initially screening pairs of fragments for similarity and then excluding those fragments that are below the threshold of similarity from further consideration in assembly algorithm.
    We have recently upgraded INHERIT package to take advantage of the next generation of FDF, the FDF-3. The FDF-3 hardware has approximately 3-times the number of cells per board with a form factor that is one-fourth the size of the previous FDF system. The FDF-3 is designed as a SCSI device, facilitating the connection of the FDF to the host computer and is designed to search either dedicated SCSI disks or the system disk. The FDF-3 also uses a new file system that speeds up searches, is easy to maintain, and supports file protection. By taking advantage of these new FDF-3 features, the new release of the INHERIT system is easier to maintain and the search time is nominally faster. The new release of the INHERIT Assembler includes more rigorous vector and ambiguous sequence removal, integrated editing, and sequence chromatogram display.
    Download PDF (410K)
  • Makoto Hirosawa, Masato Ishikawa, Masaki Hoshida
    1992Volume 3 Pages 65-68
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We will present a paradigm that makes multiple sequence alignment by knowledge.
    The technology of multiple alignment of protein is important for protein sequence analysis. So far, many alignment algorithm have been developed. However, they produce just temprorary alignment which biologists must refine to produce biologically meaningful alignment.We interviewed alignment experts and extracted knowledge from them and analyze them. The knowledge was essentially know-how to find possiible motifs in the temporary alignment and knowledge on motifs.
    Based on this analysis, we formulated alignment system with an aligner and an inteligent refiner which modifies alignment produced by aligner. The intellgent refiner refines the alignment using rules stored in a refinement rule base according to the prioriy of the rules. And Some rules consult biological knowledge base which contains motifs and so on.
    Download PDF (380K)
  • Satoru Miyano, Ayumi Shinohara, Setsuo Arikawa, Shinichi Shimozono, Ta ...
    1992Volume 3 Pages 69-72
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We present a machine learning system for knowledge acquisition that produces hypotheses from positive and negative examples, and report some experiments on protein data using the PIR and Gen Bank databases. This learning system is developed with an algorithmic learning theory for decision trees over regular patterns, which we newly devised for this research. In the experiments on transmembrane domain identification, the system discovered very simple hypotheses with very high accuracy from a small number of positive and negative examples. These hypotheses show that negative motifs, namely, motifs of negative data, play a key role in such classification. In these experiments, we classified 20 symbols of amino acid residues into 3 categories according to the hydropathy indices due to Kyte and Doolittle. We call such transformation of symbols an indexing. We observed that the indexing by the hydropathy indices is important in making the learning algorithm efficient and accurate. This observation inspired us with a desire to discover such an indexing itself just by a learning algorithm. We succeeded in it by combining the above learning algorithm and the local search technique for finding good indexings. We also report some experiments on signal peptides.
    We have implemented this learning system, called BONSAI, which shall be presented at the Computer Demonstration Session during this workshop.
    Download PDF (469K)
  • Y. Seto, Y. Ikeuchi, S. Kawakita, K. Nishikawa, M. Kanehisa
    1992Volume 3 Pages 73-76
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We propose a semi-automatic method to classify protein seqeunces into superfamilies. The method is a combination of the global sequence homology search and the local sequence identifier. The former is performed by a pairwise homology search method developed by Nishikawa et al (2), which jointed five different measures to improve sensitivity for detecting seqeunce similarity. The latter is done by applying the fragment peptide library (FRAP) compiled by Seto et al (3). We evaluate our method by applying it to 89 superfamilies in the PIR database. Out of 89 superfamilies, 65 superfamilies are classified by the combination method. We find that 20 superfamilies are multi-superfamilies: Different superfamilies belong to a same group by our criteria. We find difficulty to classify 4 superfamilies.
    Download PDF (545K)
  • Yukihiro Eguchi, Yasuhiko Seto
    1992Volume 3 Pages 77-80
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The main subject of structural-functional studies of peptides is to search functional sites from their structural information. A very simple and powerful method for prediction of the active sites in peptide hormones from their amino acids sequences is presented. The premise for the prediction method is that the most dissimilar oligomer to all the other oligomers in a peptide is most likely to take part in forming the active site. By testing on 16 peptide hormones, it is shown that 94% of their active sites can. be correctly predicted without taking into account the three-dimensional structures. When the method was applied to 11 prepro-peptides of the same hormones, the correctness is 82%.
    Download PDF (364K)
  • Yo Matsuo, Ken Nishikawa
    1992Volume 3 Pages 81-84
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    The structure of protein can be predicted if the homology to a protein of known structure is detected. However, as the similarity of two sequences gets weaker, it becomes harder to judge whether they are truly homologous or not. Here, a method is presented which discriminates between true homology and noise, using the fact that the hydrophobic nature of buried residues of homologous proteins is well conserved even if their overall sequence similarity is low. We defined, for a given protein of unknown structure, buried and exposed residues according to the sequence alignment with a protein of known structure, and then measured total hydrophobicities for the buried and exposed residues, respectively. Large hydrophobicity is expected for buried residues, and large hydrophilicity for exposed residues. In this way, the homology between two sequences is recognized.
    Download PDF (316K)
  • Mark B Swindells
    1992Volume 3 Pages 85-88
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    An algorithm is described for automatically detecting hydrophobic cores in proteins of known structure. Three pieces of information are considered in order to achieve this goal. These are; secondary structure, side chain accessibility and side chain-side chain contacts. Residues are considered to contribute to a core when they adopt a regular secondary structural conformation and have buried side chains which form mainly non-polar contacts with other buried contacts. The efficacy of this method has been assessed by comparing the predictions for interleukin-1 and Erythrina trypsin inhibitor structures, with those proposed by different authors on the basis of visual inspection. In these cases the automated procedure shows good agreement with the author definitions despite using only simple descriptives for residue interactions. This method will be useful to all those involved in protein structure analysis by providing an ability to reliably distinguish between buried residues which contribute to a hydrophobic core and those which are only locally buried.
    Download PDF (416K)
  • Fumiyoshi SASAGAWA, Koji TAJIMA
    1992Volume 3 Pages 89-92
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We study the prediction of globular protein secondary structures by a neural network and super-computer. The application of a neural network with a modular architecture to prediction of protein secondary structures (α-helix, β-sheet and coil) is presented. Each module is a three layer neural network. We compare the results from the neural network with a modular architecture and with a simple three layer structure. The prediction accuracy by a neural network with a modular architecture is higher than of the ordinary neural network. The 3, 4 and 8 state classification scheme of secondary structures are considered in the ordinary three layer neural network. The percentage of correct prediction depends on these state classification scheme.
    Download PDF (444K)
  • Hiroshi Mamitsuka, Kenji Yamanishi
    1992Volume 3 Pages 93-96
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In this paper, we apply Mamitsuka and Yamanishi's method (for short, the MY method) to predicting protein α-helix region for a-domain-type (α/α) proteins. The MY method provides a stochastic rule, which assigns, to any region in an amino acid sequence, a probability that it is α-helix. Further, on the basis of the minimum description length (MDL) principle, the MY method optimally categorizes 20 types of amino acids using their numberical attiributes (e. g. molecular weight, hydrophobicity, etc.) into less than 20 groups. Our experimental results show that, by using a variety of proteins to obtain examples of a-helix, the MY method achieves the average prediction rates of more than 80% and 70% for training and test examples respectively, and these results are significantly better than those of conventional methods, i. e. Chou and Fasman, Gander et. al., Qian and Sejnowski etc.
    Download PDF (517K)
  • Kiyoshi ASAI, Satoru HAYAMIZU, Kentaro ONIZUKA, Ken-ichi HANDA
    1992Volume 3 Pages 97-100
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In this paper, it is shown that we can efficiently use continuous speech recognition techniques for prediction of protein structures. We propose a general framework to treat the local structures and the global structures of protein together by using the continuous speech recognition techniques for protein structure prediction. This framework enables us to express the statistic information from the protein database and biological knowledge by stochastic models and grammar-like rules, and to summarize them by parsing techniques. The objects, the human voice and the protein, are not similar. However, they have similar hierarchies. In the case of speech, they are phonemes, words, phrases, sentences, meanings. In the case of protein, they are primary structures, secondary structures, super-secondary structures, tertiary structures, functions. We introduce a structure prediction system which constructs the structures of protein by considering such hierarchies.
    Download PDF (357K)
  • Yasuo Yonezawa
    1992Volume 3 Pages 101-104
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In previous resarch at Secondary structure of Proteins. the extraction & estimation of Secondary Structure from imperfect Tartiary strucutre were imposible at Kahush & Sander dictionary methods as usual algorithms. In this paper. we are described with the algorithm for extraction & estimation of Secondary structure from imperfect tartiary structure as a part di f ici t 3 D data. This algorithms were based on the vector culculation of established Secondary strucutre & tertiary structure. Namely, in this algorithm for Secondary strucutre estimation, the standard vector vale calculated from established Seconadary structure & tcrtiary structure were compared with vector vales caluculated from tertiary strucutre of unknown 2nd Strucuture.
    Download PDF (540K)
  • Kenji Satou, Satoru Kuhara, Emiko Furuichi, Kyoko Takiguchi, Toshihisa ...
    1992Volume 3 Pages 105-108
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We have developed a deductive database system PACADE for analyzing three dimensional and secondary structures of protein. We have shown that users of PACADE can easily write and check biological hypotheses using logical and declarative rules and some iterative structures can be searched by using PACADE. In this paper, we tried to provide users with a function to search the structures similar to a structure in a protein. We describe herein the outline of PACADE system, the outline of the function, and some results of the searches for super-secondary structure by using the function.
    Download PDF (2428K)
  • Seiichi Aikawa, Mayumi Tomikawa, Fumiko Matsuzawa
    1992Volume 3 Pages 109-112
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    We have developed the system which searches for similar tertiary structures to analyze relationship between protein function and its structural similarity. This system adopted root mean square distance to judge structural similarity, and generates an interpretation tree, which represents a set of pairings of elements between structures, and prunes it based on geometric constraints. We tried to search for similar structures of ATP/GTP binding sites from PDB. As a result, collected similar structures had same function. Experimental result suggests this method is useful for this kind of research.
    Download PDF (538K)
  • Kentaro Onizuka, Kiyoshi Asai, Stophen T. C. Wong
    1992Volume 3 Pages 113-116
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    In this paper, we describe a new method to classify three-dimensional local structures of protein and to model the geometric constraints of the protein tertiary structures. These constraints would allow us to predict tertiary structure more accurately than existing techniques.
    Download PDF (9803K)
  • Simulation of folding process of lysozyme
    Kouji TABUCHI, Tamio YASUKAWA, Masaki FUMOTO
    1992Volume 3 Pages 117-120
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    Protein folding simulation is useful for the prediction of its tertiary structure, as well as for detailed analysis of folding process. In a preceding paper, we have proposed a simplified simulation model where spherical elements representing respective residues are connected by virtual bonds of 3.8A in length. Each element has specific soft repulsive potential and hydrophobic attractive potential and undergoes random motions under these potentials and elastic force operative between chain ends originating from conformational entropy. In the present work, this model was applied to folding simulation of lysozyme and obtained fairly satisfactory results on the end-to-end distance and radius of gyration, as well as the time variations of the distance of separations of Cys-Cys pairs.
    Results obtained by extended models with Lennard-Jones type inter-residue interactions and module structures have also been presented and compared with intermediate conformations estimated by NMR spectroscopy.
    Download PDF (291K)
  • Eiichi SOEDA, De-Xing HOU, Tetsusi Yamagata, Haruo KISHIDA, Takanori S ...
    1992Volume 3 Pages 121-124
    Published: 1992
    Released on J-STAGE: July 11, 2011
    JOURNAL FREE ACCESS
    To construct the physical map of the human chromosome 21 (HC21) with ordered sets of cosmid clones (contig), we have introduced a HC21 specific arrayed library from LLNL and assembled the clones by the fingerprinting method of Carrano et. al.(1) Combination of three commercial available machines, Plasmid DNA Isolator (PI-100, Kurabo), Chemical Robot (DSP-240, Seiko) and Fluorescent Sequencer (DNA sequencer 373A, ABI) have enabled us to analyze a large number of cosmid clones. The software package (Contig Mapper 680A. ABI) has assembled the overlapping clones into contigs which will be mapped by dot blotting analysis with Alu-PCR products of HC21-hybrid cell panel. The dot scorer developed in collaboration with Hitachi SK recorded signals on high-density filters and assigned them to the arrayed library that was able to map them according to the panel. The mapped clones will provide starting materials to be sequenced.
    Download PDF (388K)
feedback
Top