Genome Informatics

Computational Inference of Regulatory Pathways in Microbes

An Application to Phosphorus Assimilation Pathways in Synechococcus sp. WH8102

Zhengchang Su, Phuongan Dam, Xin Chen, Victor Olman, Tao Jiang, Brian ...

2003 年 14 巻 p. 3-13
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.3

ジャーナルフリー

抄録を表示する抄録を非表示にする

We present a computational protocol for inference of regulatory and signaling pathways in a microbial cell, through literature search, mining “high-throughput” biological data of various types, and computer-assisted human inference. This protocol consists of four key components:(a) construction of template pathways for microbial organisms related to the target genome, which either have been extensively studied and/or have a significant amount of (relevant) experimental data, (b) inference of initial pathway models for the target genome, through combining the template pathway models and target genome-specific information, (c) refinement and expansion of the initial pathway models through applications of various data mining tools, including phylogenetic profile analysis, inference of protein-protein interactions, and prediction of transcription factor binding sites, and (d) validation and refinement of the pathway models using pathway-specific experimental data or other information. To demonstrate the effectiveness of this procedure, we have applied it to the construction of the phosphorus assimilation pathways in cyanobacterium sp. WH8102. We present, in this paper, a model of the core components of this pathway.

抄録全体を表示

PDF形式でダウンロード (1799K)
A Case Study of Object-Oriented Bio-Chemistry

A Unified Specification of the Coagulation Cascade

Jacqueline Signorini, Patrick Greussay

2003 年 14 巻 p. 14-22
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.14

ジャーナルフリー

抄録を表示する抄録を非表示にする

We propose a case study where a familiar but very complex and intrinsically woven biocomputing system-the blood clotting cascade-is specified using methods from software design known as object-oriented design (OOD). The specifications involve definition and inheritance of classes and methods and use design techniques from the most widely used OOD-language: the Unified Modeling Language (UML), as well as its Real-Time-UML extension.
First, we emphasize the needs for a unifiedmethodology to specify complex enough biological and biochemical processes. Then, using the blood clotting cascade as a example, we define the class diagrams which exhibit the static structure of procoagulant factors of proenzyme-enzyme conversions, and finally we give a dynamic model involving events, collaboration, synchronization and sequencing.
We thus show that OOD can be used in fields very much beyond software design, gives the benefit of unified and sharable descriptions and, as a side effect, automatic generation of simulation software.

抄録全体を表示

PDF形式でダウンロード (848K)
MetaFluxNet, a Program Package for Metabolic Pathway Construction and Analysis, and Its Use in Large-Scale Metabolic Flux Analysis of Escherichia coli

Sang Yup Lee, Dong-Yup Lee, Soon Ho Hong, Tae Yong Kim, Hongsoek Yun, ...

2003 年 14 巻 p. 23-33
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.23

ジャーナルフリー

抄録を表示する抄録を非表示にする

We have developed MetaFluxNet which is a stand-alone program package for the management of metabolic reaction information and quantitative metabolic flux analysis. It allows users to interpret and examine metabolic behavior in response to genetic and/or environmental modifications. As a result, quantitative in silico simulations of metabolic pathways can be carried out to understand the metabolic status and to design the metabolic engineering strategies. The main features of the program include a well-developed model construction environment, user-friendly interface for metabolic flux analysis (MFA), comparative MFA of strains having different genotypes under various environmental conditions, and automated pathway layout creation. The usefulness and functionality of the program are demonstrated by applying to metabolic pathways in E. coli. First, a large-scale in silico E. coli model is constructed using MetaFluxNet, and then the effects of carbon sources on intracellular flux distributions and succinic acid production were investigated on the basis of the uptake and secretion rates of the relevant metabolites. The results indicated that among three carbon sources available, the most reduced substrate is sorbitol which yields efficient succinic acid production. The software can be downloaded from http://mbel.kaist.ac.kr/.

抄録全体を表示

PDF形式でダウンロード (1681K)
Reducing False Positives in Molecular Pattern Recognition

Xijin Ge, Shuichi Tsutsumi, Hiroyuki Aburatani, Shuichi Iwata

2003 年 14 巻 p. 34-43
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.34

ジャーナルフリー

抄録を表示する抄録を非表示にする

In the search for new cancer subtypes by gene expression profiling, it is essential to avoid misclassifying samples of unknown subtypes as known ones. In this paper, we evaluated the false positive error rates of several classification algorithms through a ‘null test’ by presenting classifiers a large collection of independent samples that do not belong to any of the tumor types in the training dataset. The benchmark dataset is available at www2.genome.rcast.u-tokyo.ac.jp/pm/. We found that k-nearest neighbor (KNN) and support vector machine (SVM) have very high false positive error rates when fewer genes (<100) are used in prediction. The error rate can be partially reduced by including more genes. On the other hand, prototype matching (PM) method has a much lower false positive error rate. Such robustness can be achieved without loss of sensitivity by introducing suitable measures of prediction confidence. We also proposed a cluster-and-select technique to select genes for classification. The nonparametric Kruskal-Wallis H test is employed to select genes differentially expressed in multiple tumor types. To reduce the redundancy, we then divided these genes into clusters with similar expression patterns and selected a given number of genes from each cluster. The reliability of the new algorithm is tested on three public datasets.

抄録全体を表示

PDF形式でダウンロード (1281K)
On Combining Multiple Microarray Studies for Improved Rinctional Classification by Whole-Dataset Feature Selection

See-Kiong Ng, Soon-Heng Tan, V.S. Sundararajan

2003 年 14 巻 p. 44-53
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.44

ジャーナルフリー

抄録を表示する抄録を非表示にする

As microarray technologies become routinely applied in genome laboratories for studying gene expression, it is not uncommon that experiments on identical or similar sets of genes are conducted by multiple laboratories for various functional studies of these genes. Much of such data are often available to researchers for their data analysis, either through collaborators or from online gene expression databases. It will be useful to combine data from different microarray studies to improve the microarray data mining results.
We show that the functional classification of genes from microarray data can be improved further by combining gene expression data from multiple microarray studies, even if the experimental focus or conditions for each experimental study may differ. However, blindly combining all available datasets may not always improve the analysis results-it is important to be selective of the datasets for inclusion. In our approach, we consider each dataset to be one feature, and then apply feature selection strategies to select appropriate datasets for training. With a simple hill-climbing method, we show that gene classification performances can be improved by whole-dataset feature selection.

抄録全体を表示

PDF形式でダウンロード (1141K)
Statistical Inference Methods for Detecting Altered Gene Associations

Sang-Heon Yoon, Je-Suk Kim, Hae-Hiang Song

2003 年 14 巻 p. 54-63
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.54

ジャーナルフリー

抄録を表示する抄録を非表示にする

The higher incidence of liver disease in the Asian population raises a great concern to clinicians. To understand the gene functions involved in different stages of the disease, microarray expression data of histological progressive grades, starting from the dysplastic nodule in cirrhotic liver to hepatocellular carcinoma Edmonson grade III are analyzed. The statistical procedures are divided into two parts: First, microarray data are suitably normalized, including a method of analysis of variance (ANOVA). There are great differences of opinion regarding the currently used normalization methods. In order to proceed to the second part of statistical analyses of gene-pair associations, these normalization methods need first to be compared. Based on the assumption that a union set of significant genes from these normalization methods includes sufficiently general and well-defined, differentially expressed genes, one must carry out the second part of statistical analyses of searching for evidence of altered gene-gene relationships with progression of the disease. Significantly altered gene-pair associations are identified with the ratio of gene-pair correlations. The methods are illustrated with replicated microarray expression data.

抄録全体を表示

PDF形式でダウンロード (1273K)
Splice Site Detection with a Higher-Order Markov Model Implemented on a Neural Network

Loi Sy Ho, Jagath C. Rajapakse

2003 年 14 巻 p. 64-72
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.64

ジャーナルフリー

抄録を表示する抄録を非表示にする

The performance of the ab inito gene prediction approaches mostly depends on the effectiveness of detecting the splice sites. This paper addresses the problem of splice site detection using higher-order Markov models. The tenet of our approach is to brace the higher-order dependencies a Markov model by a neural network that receives the inputs from low-order Markov chains. The method is able not only to capture the higher-order dependencies in the bases of the consensus sequence immediately surrounding the splice site but also to distinguish the characteristics of the coding and non-coding regions on both sides of the splice site. Our experiments indicate that the present method achieves better accuracies over the techniques employing low-order Markov chains and other earlier approaches.

抄録全体を表示

PDF形式でダウンロード (947K)
On Selecting Features from Splice Junctions

An Analysis Using Information Theoretic and Machine Learning Approaches

Christina L. Zheng, Virginia R. De Sa, Michael Gribskov, T. Murlidhara ...

2003 年 14 巻 p. 73-83
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.73

ジャーナルフリー

抄録を表示する抄録を非表示にする

The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

抄録全体を表示

PDF形式でダウンロード (1208K)
An In-Silico Method for Prediction of Polyadenylation Signals in Human Sequences

Huiqing Liu, Hao Han, Jinyan Li, Limsoon Wong

2003 年 14 巻 p. 84-93
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.84

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents a machine learning method to predict polyadenylation signals (PASes) in human DNA and mRNA sequences by analysing features around them. This method consists of three sequential steps of feature manipulation: generation, selection and integration of features. In the first step, new features are generated using k-gram nucleotide acid or amino acid patterns. In the second step, a number of important features are selected by an entropy-based algorithm. In the third step, support vector machines are employed to recognize true PASes from a large number of candidates. Our study shows that true PASes in DNA and mRNA sequences can be characterized by different features, and also shows that both upstream and downstream sequence elements are important for recognizing PASes from DNA sequences. We tested our method on several public data sets as well as our own extracted data sets. In most cases, we achieved better validation results than those reported previously on the same data sets. The important motifs observed are highly consistent with those reported in literature.

抄録全体を表示

PDF形式でダウンロード (1260K)
Construction of Genetic Network Using Evolutionary Algorithm and Combined Fitness Function

Shin Ando, Hitoshi Iba

2003 年 14 巻 p. 94-103
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.94

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper proposes a method to capture the dynamics in gene expression data using S-system formalism and construct genetic network models. Our purposed method exploits the probabilistic heuristic search and divide-and-conquer approach to estimate the network structure. In evaluating the network structure, we attempt a primitive integration of other knowledge to the statistical criterion. The Z-score is used to analyze the robust and significant parameters from stochastic search results. We evaluated the proposed method on artificially generated data and E. coli mRNA expression data.

抄録全体を表示

PDF形式でダウンロード (1122K)
Layout Search of a Gene Regulatory Network for 3-D Visualization

Naoki Hosoyama, Noman Nasimul, Hitoshi Iba

2003 年 14 巻 p. 104-113
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.104

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, base sequences have been increasingly unscrambled through attempts represented by the human genome project. Accordingly, the estimation of the genetic network has been accelerated. However, no definitive method has become available for drawing a large effective graph. This paper proposes a method which allows for coping with an increase in the number of nodes by laying out genes on planes of several layers and then overlapping these planes. This layout involves an optimization problem which requires maximizing the fitness function. To demonstrate the effectiveness of our approach, we show some graphs using actual data on 82 genes and 552 genes. We also describe how to lay out nodes by means of stochastic searches, e.g., stochastic hill-climbing and incremental methods. The experimental results show the superiority and usefulness of two search methods in comparison with the simple random search.

抄録全体を表示

PDF形式でダウンロード (1125K)
Neural-Network-Based Parameter Estimation in S-System Models of Biological Networks

Jonas S. Almeida, Eberhard O. Voit

2003 年 14 巻 p. 114-123
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.114

ジャーナルフリー

抄録を表示する抄録を非表示にする

The genomic and post-genomic eras have been blessing us with overwhelming amounts of data that are of increasing quality. The challenge is that most of these data alone are mere snapshots of the functioning organism and do not reveal the organizational structure of which the particular genes and metabolites are contributors. To gain an appreciation of their roles and functions within cells and organisms, genomic and metabolic data need to be integrated in systems models that allow the testing of hypotheses, generate experimentally testable predictions, and ultimately lead to true explanations. One type of data that is particularly well suited for such integration consists of time profiles, which show gene activities, metabolite concentrations, or protein prevalences at dense series of time points. We show with a specific example how such time series can be analyzed and evaluated, if some structural information about the data is available, even if this information is incomplete. The method consists of three components. The first is a particularly suitable mathematical modeling framework, namely Biochemical Systems Theory, in which parameters are direct indicators of the organization of the underlying phenomenon, the second is the training of an artificial neural network for data smoothing and complementation, and the third is a technique for reinterpreting differential equations in a fashion that facilitates parameter estimation. A prototype webtool for these analyses is available at https://bioinformatics.musc.edu/webmetabol/.

抄録全体を表示

PDF形式でダウンロード (1403K)
Finding Optimal Gene Networks Using Biological Constraints

Sascha Ott, Satoru Miyano

2003 年 14 巻 p. 124-133
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.124

ジャーナルフリー

抄録を表示する抄録を非表示にする

The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NP-hard and exhibits a search space of super-exponential size, researchers are using heuristic algorithms for this task. However, little can be said about the accuracy of heuristic estimations. In order to overcome this problem, we present a general approach to reduce the search space to a biologically meaningful subspace and to find optimal solutions within the subspace in linear time. We show the effectiveness of this approach in application to yeast and Bacillus subtilis data.

抄録全体を表示

PDF形式でダウンロード (1114K)
Efficient Tree-Matching Methods for Accurate Carbohydrate Database Queries

Kiyoko F. Aoki, Atsuko Yamaguchi, Yasushi Okuno, Tatsuya Akutsu, Nobuh ...

2003 年 14 巻 p. 134-143
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.134

ジャーナルフリー

抄録を表示する抄録を非表示にする

One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized a polynomial-time dynamic programming algorithm for solving the maximum common subtree of two trees to implement an accurate and efficient tool for finding and aligning maximally matching glycan trees. The KEGG Glycan database for glycan structures released recently incorporates our tree-structure alignment algorithm with various parameters to adapt to the needs of a variety of users. Because we use similarity scores as opposed to a distance metric, our methods are more readily used to display trees of higher similarity. We present the two methods developed for this purpose and illustrate its validity.

抄録全体を表示

PDF形式でダウンロード (1132K)
Heuristics for Chemical Compound Matching

Masahiro Hattori, Yasushi Okuno, Susumu Goto, Minoru Kanehisa

2003 年 14 巻 p. 144-153
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.144

ジャーナルフリー

抄録を表示する抄録を非表示にする

We have developed an efficient algorithm for comparing two chemical compounds, where the chemical structure is treated as a 2D graph consisting of atoms as vertices and covalent bonds as edges. Based on the concept of functional groups in chemistry, 68 atom types (vertex types) are defined for carbon, nitrogen, oxygen, and other atomic species with different environments, which has enabledd etectiono f biochemicallym eaningfulf eatures.M aximalc ommons ubgraphs of two graphs can be found by searching for maximal cliques in the association graph, and we have introducedh euristicst o acceleratet he clique finding. Our heuristicp rocedurei s controlled by some adjustablep arameters. Herew e appliedo ur proceduret o the latest KEGG/LIGANDd atabase with different sets of parameters, and demonstrated the correlation of parameters in our algorithm with the distribution of similarity scores and/or the execution time. Finally, we showed the effectiveness of our heuristics for compound pairs along metabolic pathways.

抄録全体を表示

PDF形式でダウンロード (1268K)
Processing Sequence Annotation Data Using the Lua Programming Language

Yutaka Ueno, Masanori Arita, Toshitaka Kumagai, Kiyoshi Asai

2003 年 14 巻 p. 154-163
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.154

ジャーナルフリー

抄録を表示する抄録を非表示にする

The data processing language in a graphical software tool that manages sequence annotation data from genome databases should provide flexible functions for the tasks in molecular biology research. Among currently available languages we adopted the Lua programming language. It fulfills our requirements to perform computational tasks for sequence map layouts, i.e. the handling of data containers, symbolic reference to data, and a simple programming syntax. Upon importing a foreign file, the original data are first decomposed in the Lua language while maintaining the original data schema. The converted data are parsed by the Lua interpreter and the contents are stored in our data warehouse. Then, portions of annotations are selected and arranged into our catalog format to be depicted on the sequence map. Our sequence visualization program was successfully implemented, embedding the Lua language for processing of annotation data and layout script. The program is available at http://staff.aist.go.jp/yutaka.ueno/guppy/.

抄録全体を表示

PDF形式でダウンロード (1263K)
PatternHunter II: Highly Sensitive and Fast Homology Search

Ming Li, Bin Ma, Derek Kisman, John Tromp

2003 年 14 巻 p. 164-175
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.164

ジャーナルフリー

抄録を表示する抄録を非表示にする

Extending the single optimized spaced seed of PatternHunter [20] to multiple ones, Pattern-Hunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search technology back to a full circle.

抄録全体を表示

PDF形式でダウンロード (1449K)
On Half Gapped Seed

Wei Chen, Wing-kin Sung

2003 年 14 巻 p. 176-185
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.176

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we proposed a new type of seed for Blast-like homology search tools called “half seed”. This new seed is better than the “consecutive seed” used by the original Blast tools in both sensitivity and efficiency. When compared with the “gapped seed”, which is proposed together with a new Blast-like searching tool, PatternHunter, this new seed offers a much wider range of choices for performing tradeoff between sensitivity and efficiency. This property is especially useful when some searching applications want to get more precise results with limitation on hardware resources, or vice versa.

抄録全体を表示

PDF形式でダウンロード (1000K)
Clone-Array Pooled Shotgun Mapping and Sequencing

Design and Analysis of Experiments

Miklós Csürös, Bingshan Li, Aleksandar Milosavljevic

2003 年 14 巻 p. 186-195
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.186

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper studies sequencing and mapping methods that rely solely on pooling and shotgun sequencing of clones. First, we scrutinize and improve the recently proposed Clone-Array Pooled Shotgun Sequencing (CAPSS) method, which delivers a BAC-linked assembly of a whole genome sequence. Secondly, we introduce a novel physical mapping method, called Clone-Array Pooled Shotgun Mapping (CAPS-MAP), which computes the physical ordering of BACs in a random library. Both CAPSS and CAPS-MAP construct subclone libraries from pooled genomic BAC clones.

抄録全体を表示

PDF形式でダウンロード (1292K)
Prediction and Analysis of β-Turns in Proteins by Support Vector Machine

Tho Hoan Pham, Kenji Satou, Tu Bao Ho

2003 年 14 巻 p. 196-205
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.196

ジャーナルフリー

抄録を表示する抄録を非表示にする

Tight turn has long been recognized as one of the three important features of proteins after the a-helix and β-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are β-turns. Analysis and prediction of β-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of β-turns. We have investigated two aspects of applying SVM to the prediction and analysis of β-turns. First, we developed a new SVM method, called BTSVM, which predicts, β-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of, β-turns based on the “multivariable” classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.

抄録全体を表示

PDF形式でダウンロード (1195K)
Multi-Class Protein Fold Classification Using a New Ensemble Machine Learning Approach

Aik Choon Tan, David Gilbert, Yves Deville

2003 年 14 巻 p. 206-217
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.206

ジャーナルフリー

抄録を表示する抄録を非表示にする

Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources.

抄録全体を表示

PDF形式でダウンロード (1737K)
Multi-Class Support Vector Machines for Protein Secondary Structure Prediction

Minh N. Nguyen, Jagath C. Rajapakse

2003 年 14 巻 p. 218-227
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.218

ジャーナルフリー

抄録を表示する抄録を非表示にする

The solution of binary classification problems using the Support Vector Machine (SVM) method has been well developed. Though multi-class classification is typically solved by combining several binary classifiers, recently, several multi-class methods that consider all classes at once have been proposed. However, these methods require resolving a much larger optimization problem and are applicable to small datasets. Three methods based on binary classifications: one-against-all (OAA), one-against-one (OAO), and directed acyclic graph (DAG), and two approaches for multi-class problem by solving one single optimization problem, are implemented to predict protein secondary structure. Our experiments indicate that multi-class SVM methods are more suitable for protein secondary structure (PSS) prediction than the other methods, including binary SVMs, because their capacity to solve an optimization problem in one step. Furthermore, in this paper, we argue that it is feasible to extend the prediction accuracy by adding a second-stage multi-class SVM to capture the contextual information among secondary structural elements and thereby further improving the accuracies. We demonstrate that two-stage SVMs perform better than single-stage SVM techniques for PSS prediction using two datasets and report a maximum accuracy of 79.5%.

抄録全体を表示

PDF形式でダウンロード (1055K)
Development of an ab initio Protein Structure Prediction System ABLE

Takashi Ishida, Takeshi Nishimura, Makoto Nozaki, Tsuyoshi Inoue, Tohr ...

2003 年 14 巻 p. 228-237
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.228

ジャーナルフリー

抄録を表示する抄録を非表示にする

An ab initio protein structure prediction system called ABLE is described. It is based on the fragment assembly method, which consists of two steps: dividing a target sequence into overlapping subsequences (fragments) of short length and assigning a local structure to each fragment; and generating models by assembling the local structures and selecting the models with low potential energy. One of the most important problems in conventional fragment assembly methods is the difficulty of selecting native-like structures by energy minimization only. ABLE thus employs a structural clustering method to select the native-like models from among the generated models. By applying the unit-vector root mean square distance (URMS) as a measure of structure similarity, we achieve more robust, effective structural clustering. When no enough clusters of good quality are obtained, ABLE runs the energy minimization procedure again by incorporating structural restraint conditions obtained from the consensus substructures in the previously generated models. This approach is based on our observation that there is a high probability that the consensus substructures of the generated models have native-like structures. Another feature of ABLE is that in assigning local structures to fragments, it assigns mainchain dihedral angles (φ, ψ) to the central residue of each fragment according to a probability distribution map built from candidate sequences similar to each fragment. This enables the system to generate appropriate local structures that may not already exist in a protein structure database. We applied our system to 25 small proteins and obtain near-native folds for more than half of them. We also demonstrate the performance of our structural clustering method, which can be applied to other protein structure prediction systems.

抄録全体を表示

PDF形式でダウンロード (1143K)
Docking Unbound Proteins with MIAX

A Novel Algorithm for Protein-Protein Soft Docking

Carlos A. Del Carpio Munoz, Tobias Peissker, Atsushi Yoshimori, Eiichi ...

2003 年 14 巻 p. 238-249
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.238

ジャーナルフリー

抄録を表示する抄録を非表示にする

We propose a new methodology for “soft” docking unbound protein molecules (reported at the isolated state). The methodology is characterized by its simplicity and easiness of embedment in any rigid body docking process based on point complementarity. It is oriented to allow limited free but not unrealistic interpenetration of the side chains of protein surface amino acid residues. The central step to the technique is a filtering process similar to those in image processing. The methodology assists in deletion of atomic-scale details on the surface of the interacting monomers, leading to the extraction of the most characteristic flattened shape for the molecule as well as the definition of a soft layer of atoms to allow smooth interpenetration of the interacting molecules during the docking process. Although the methodology does not perform structural or conformational rearrangements in the interacting monomers, results output by the algorithm are in fair agreement with the relative position of the monomer in experimentally reported complexes. The algorithm performs especially well in cases where the complexity of the protein surfaces is high, that is in hetero dimmer complex prediction. The algorithm is oriented to play the role of a fast screening engine for proteins known to interact but for which no information other than that of the structures at the isolated state is available. Consequently the importance of the methodology will increase in structural-function studies of thousand of proteins derived from large scale genome sequencing projects being executed all around the globe

抄録全体を表示

PDF形式でダウンロード (1298K)
A Domain Combination Based Probabilistic Framework for Protein-Protein Interaction Prediction

Dongsoo Han, Hong-Soog Kim, Jungmin Seo, Woohyuk Jang

2003 年 14 巻 p. 250-259
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.250

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to overcome the limitations of the conventional domain pair based prediction systems. The framework largely consists of prediction preparation and service stages. In the prediction preparation stage, two appearance probability matrices are constructed. Each matrix holds information on appearance frequencies of domain combination pairs in the interacting and non-interacting sets of protein pairs, respectively. Based on the appearance probability matrix, a probability equation is devised. The equation maps a protein pair to a real number in the range of 0 to 1. Two distributions of interacting and non-interacting sets of protein pairs are obtained using the equation. In the prediction service stage, the interaction probability of a protein pair is predicted using the distributions and the equation. The validity of the prediction model is evaluated for the interacting set of protein pairs in a Yeast organism and artificially generated noninteracting set of protein pairs. When 80% of the set of interacting protein pairs in DIP (Database of Interacting Proteins) is used as a learning set of interacting protein pairs, very high sensitivity (86%) and moderate specificity (56%) are achieved within our framework.

抄録全体を表示

PDF形式でダウンロード (1232K)
On the Art of Modeling; Illustrated with the Analysis of the Golf Swing Motion

Hirotugu Akaike

2003 年 14 巻 p. 263-265
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.263

ジャーナルフリー

抄録を表示する抄録を非表示にする

The role of a model is to provide adequate knowledge to handle a particular problem. The work of modeling starts on the basis of the feel and knowledge of the object and proceeds by developing guesses about the structure of the object. In this paper characteristics of this process are demonstrated with the example of the analysis of the golf swing motion.

抄録全体を表示

PDF形式でダウンロード (330K)
Predicting Nucleic Acid Hybridization and Melting Profiles

Michael Zuker

2003 年 14 巻 p. 266-268
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.266

ジャーナルフリー

PDF形式でダウンロード (332K)
Orthologous Sets of Functional Networks: Inference, Mining and Visualization

Charles De Lisi

2003 年 14 巻 p. 269
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.269

ジャーナルフリー

PDF形式でダウンロード (52K)
CADLIVE System: Map-Based Dynamic Simulation of Biochemical Networks

Hiroyuki Kurata, Rei Iwasaki, Kouichi Masaki, Takayuki Tanaka, Kouji M ...

2003 年 14 巻 p. 270-271
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.270

ジャーナルフリー

PDF形式でダウンロード (359K)
Selection of Causal Gene Sets from Gene Expression Profiles Using GeneFis^®, New Software Based on FNN

Hiroyuki Honda, Takeshi Kobayashi

2003 年 14 巻 p. 272-273
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.272

ジャーナルフリー

PDF形式でダウンロード (260K)
Development of an Integrated System for Genetic Network Analysis and Microarray Data Management

Ji-Hung Kim, Kyung-Shin Lee, Pan-Gyu Kim, Hwan-Gue Cho

2003 年 14 巻 p. 274-275
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.274

ジャーナルフリー

PDF形式でダウンロード (302K)
An Open Source Client-Server System for the Analysis of Affymetrix Microarray Data

Lars Martin Jakt, Mitsuhiro Okada, Shin-Ichi Nishikawa

2003 年 14 巻 p. 276-277
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.276

ジャーナルフリー

PDF形式でダウンロード (252K)
ASIAN: A Web Site for Network Inference

Katsuhisa Horimoto, Hiroyuki Toh, Sachiyo Aburatani, Nobuyoshi Sugaya, ...

2003 年 14 巻 p. 278-279
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.278

ジャーナルフリー

PDF形式でダウンロード (286K)
Netview: Application Software for Constructing and Visually Exploring Phylogenetic Networks

Kirill Kryukov, Naruya Saitou

2003 年 14 巻 p. 280-281
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.280

ジャーナルフリー

PDF形式でダウンロード (189K)
Integrated System for Inference of Gene Expression Network

Masahiko Nakatsui, Takanori Ueda, Masahiro Okamoto

2003 年 14 巻 p. 282-283
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.282

ジャーナルフリー

PDF形式でダウンロード (261K)
Mutation View: An Integrated Knowledge Base for Mutations and Polymorphisms in Human Disease Genes

Automatical Extraction of Disease-Associated Knowledge

Masafumi Ohtsubo, Susumu Mitsuyama, Takashi Kawamura, Nobuyoshi Shimiz ...

2003 年 14 巻 p. 284-285
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.284

ジャーナルフリー

PDF形式でダウンロード (294K)
BirdsAnts: Bringing Informative Rules from a Database System, Aimed at Novel Targets Search

Motoi Tobita, Ken Horiuchi, Kenji Araki, Masashi Nemoto, Tetsuo Nishik ...

2003 年 14 巻 p. 286-287
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.286

ジャーナルフリー

PDF形式でダウンロード (279K)
Ecell2d: Distributed E-CELL2

Takashi Yamazaki, Ariya Fujita, Iriko Kaneko, Yoshinari Fukui, Toshika ...

2003 年 14 巻 p. 288-289
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.288

ジャーナルフリー

PDF形式でダウンロード (202K)
Integrated Distributed Computing Environment on the G-Language GAE v. 2

Ryo Hattori, Kazuharu Arakawa, Hayataro Kouchi, Masaru Tomita

2003 年 14 巻 p. 290-291
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.290

ジャーナルフリー

PDF形式でダウンロード (201K)
SuperNORM: A Computer Program for the Parametric Normalization of Microarray Data

Tomokazu Konishi, Masanori Yoshida, Kenya Shibahara

2003 年 14 巻 p. 292-293
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.292

ジャーナルフリー

PDF形式でダウンロード (215K)
E-CELL System Version 3: A Software Platform for Integrative Computational Biology

Kouichi Takahashi, Takeshi Sakurada, Kazunari Kaizu, Tomoya Kitayama, ...

2003 年 14 巻 p. 294-295
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.294

ジャーナルフリー

PDF形式でダウンロード (243K)
BPE: Biopathway Executer for Large-Scale Biopathway Modeling and Simulation

Masao Nagasaki, Atsushi Doi, Kazuko Ueno, Eri Torikai, Hiroshi Matsuno ...

2003 年 14 巻 p. 296-297
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.296

ジャーナルフリー

PDF形式でダウンロード (257K)
Bioinformatics and Computational Biology with Biopython

Michiel J. L. De Hoon, Brad Chapman, Iddo Friedberg

2003 年 14 巻 p. 298-299
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.298

ジャーナルフリー

PDF形式でダウンロード (162K)
Representing Metabolic Networks by the Substrate-Product Relationships

Masanori Arita

2003 年 14 巻 p. 300-301
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.300

ジャーナルフリー

PDF形式でダウンロード (191K)
Comprehensive Analysis of Delay in Transcriptional Regulation Using Expression Profiles

Koji Ota, Takuji Yamada, Yoshihiro Yamanishi, Susumu Goto, Minoru Kane ...

2003 年 14 巻 p. 302-303
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.302

ジャーナルフリー

PDF形式でダウンロード (234K)
Prediction of Glycan Structures from Glycosyltransferase Expression Profiles

Shin Kawano, Yasushi Okuno, Kosuke Hashimoto, Harumi Yamamoto, Hiromu ...

2003 年 14 巻 p. 304-305
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.304

ジャーナルフリー

PDF形式でダウンロード (211K)
Statistical Analysis of the Relationship between Gene Expression and Location

Sachiyo Aburatani, Nobuyoshi Sugaya, Hiroo Murakami, Makihiko Sato, Ka ...

2003 年 14 巻 p. 306-307
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.306

ジャーナルフリー

PDF形式でダウンロード (191K)
Detection of Genes with Tissue-Specific Patterns Using Akaike's Information Criterion

Koji Kadota, Katsutoshi Takahashi

2003 年 14 巻 p. 308-309
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.308

ジャーナルフリー

PDF形式でダウンロード (210K)
Operon Prediction by DNA Microarray: An Approach with a Bayesian Network Model

Hitoshi Shimizu, Shigeyuki Oba, Shin Ishii

2003 年 14 巻 p. 310-311
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.310

ジャーナルフリー

PDF形式でダウンロード (241K)
Automatic Extraction of Expression-Related Features Shared by a Given Group of Genes

Takuya Oyama, Mikio Yoshida, Satoshi Kamegai, Kagehiko Kitano, Fumihit ...

2003 年 14 巻 p. 312-313
発行日: 2003年
公開日: 2011/07/11

DOIhttps://doi.org/10.11234/gi1990.14.312

ジャーナルフリー

PDF形式でダウンロード (234K)

J-STAGEへの登録はこちら（無料）