Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
14 巻
選択された号の論文の256件中1~50を表示しています
  • An Application to Phosphorus Assimilation Pathways in Synechococcus sp. WH8102
    Zhengchang Su, Phuongan Dam, Xin Chen, Victor Olman, Tao Jiang, Brian ...
    2003 年 14 巻 p. 3-13
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    We present a computational protocol for inference of regulatory and signaling pathways in a microbial cell, through literature search, mining “high-throughput” biological data of various types, and computer-assisted human inference. This protocol consists of four key components:(a) construction of template pathways for microbial organisms related to the target genome, which either have been extensively studied and/or have a significant amount of (relevant) experimental data, (b) inference of initial pathway models for the target genome, through combining the template pathway models and target genome-specific information, (c) refinement and expansion of the initial pathway models through applications of various data mining tools, including phylogenetic profile analysis, inference of protein-protein interactions, and prediction of transcription factor binding sites, and (d) validation and refinement of the pathway models using pathway-specific experimental data or other information. To demonstrate the effectiveness of this procedure, we have applied it to the construction of the phosphorus assimilation pathways in cyanobacterium sp. WH8102. We present, in this paper, a model of the core components of this pathway.
  • A Unified Specification of the Coagulation Cascade
    Jacqueline Signorini, Patrick Greussay
    2003 年 14 巻 p. 14-22
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a case study where a familiar but very complex and intrinsically woven biocomputing system-the blood clotting cascade-is specified using methods from software design known as object-oriented design (OOD). The specifications involve definition and inheritance of classes and methods and use design techniques from the most widely used OOD-language: the Unified Modeling Language (UML), as well as its Real-Time-UML extension.
    First, we emphasize the needs for a unifiedmethodology to specify complex enough biological and biochemical processes. Then, using the blood clotting cascade as a example, we define the class diagrams which exhibit the static structure of procoagulant factors of proenzyme-enzyme conversions, and finally we give a dynamic model involving events, collaboration, synchronization and sequencing.
    We thus show that OOD can be used in fields very much beyond software design, gives the benefit of unified and sharable descriptions and, as a side effect, automatic generation of simulation software.
  • Sang Yup Lee, Dong-Yup Lee, Soon Ho Hong, Tae Yong Kim, Hongsoek Yun, ...
    2003 年 14 巻 p. 23-33
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    We have developed MetaFluxNet which is a stand-alone program package for the management of metabolic reaction information and quantitative metabolic flux analysis. It allows users to interpret and examine metabolic behavior in response to genetic and/or environmental modifications. As a result, quantitative in silico simulations of metabolic pathways can be carried out to understand the metabolic status and to design the metabolic engineering strategies. The main features of the program include a well-developed model construction environment, user-friendly interface for metabolic flux analysis (MFA), comparative MFA of strains having different genotypes under various environmental conditions, and automated pathway layout creation. The usefulness and functionality of the program are demonstrated by applying to metabolic pathways in E. coli. First, a large-scale in silico E. coli model is constructed using MetaFluxNet, and then the effects of carbon sources on intracellular flux distributions and succinic acid production were investigated on the basis of the uptake and secretion rates of the relevant metabolites. The results indicated that among three carbon sources available, the most reduced substrate is sorbitol which yields efficient succinic acid production. The software can be downloaded from http://mbel.kaist.ac.kr/.
  • Xijin Ge, Shuichi Tsutsumi, Hiroyuki Aburatani, Shuichi Iwata
    2003 年 14 巻 p. 34-43
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    In the search for new cancer subtypes by gene expression profiling, it is essential to avoid misclassifying samples of unknown subtypes as known ones. In this paper, we evaluated the false positive error rates of several classification algorithms through a ‘null test’ by presenting classifiers a large collection of independent samples that do not belong to any of the tumor types in the training dataset. The benchmark dataset is available at www2.genome.rcast.u-tokyo.ac.jp/pm/. We found that k-nearest neighbor (KNN) and support vector machine (SVM) have very high false positive error rates when fewer genes (<100) are used in prediction. The error rate can be partially reduced by including more genes. On the other hand, prototype matching (PM) method has a much lower false positive error rate. Such robustness can be achieved without loss of sensitivity by introducing suitable measures of prediction confidence. We also proposed a cluster-and-select technique to select genes for classification. The nonparametric Kruskal-Wallis H test is employed to select genes differentially expressed in multiple tumor types. To reduce the redundancy, we then divided these genes into clusters with similar expression patterns and selected a given number of genes from each cluster. The reliability of the new algorithm is tested on three public datasets.
  • See-Kiong Ng, Soon-Heng Tan, V.S. Sundararajan
    2003 年 14 巻 p. 44-53
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    As microarray technologies become routinely applied in genome laboratories for studying gene expression, it is not uncommon that experiments on identical or similar sets of genes are conducted by multiple laboratories for various functional studies of these genes. Much of such data are often available to researchers for their data analysis, either through collaborators or from online gene expression databases. It will be useful to combine data from different microarray studies to improve the microarray data mining results.
    We show that the functional classification of genes from microarray data can be improved further by combining gene expression data from multiple microarray studies, even if the experimental focus or conditions for each experimental study may differ. However, blindly combining all available datasets may not always improve the analysis results-it is important to be selective of the datasets for inclusion. In our approach, we consider each dataset to be one feature, and then apply feature selection strategies to select appropriate datasets for training. With a simple hill-climbing method, we show that gene classification performances can be improved by whole-dataset feature selection.
  • Sang-Heon Yoon, Je-Suk Kim, Hae-Hiang Song
    2003 年 14 巻 p. 54-63
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The higher incidence of liver disease in the Asian population raises a great concern to clinicians. To understand the gene functions involved in different stages of the disease, microarray expression data of histological progressive grades, starting from the dysplastic nodule in cirrhotic liver to hepatocellular carcinoma Edmonson grade III are analyzed. The statistical procedures are divided into two parts: First, microarray data are suitably normalized, including a method of analysis of variance (ANOVA). There are great differences of opinion regarding the currently used normalization methods. In order to proceed to the second part of statistical analyses of gene-pair associations, these normalization methods need first to be compared. Based on the assumption that a union set of significant genes from these normalization methods includes sufficiently general and well-defined, differentially expressed genes, one must carry out the second part of statistical analyses of searching for evidence of altered gene-gene relationships with progression of the disease. Significantly altered gene-pair associations are identified with the ratio of gene-pair correlations. The methods are illustrated with replicated microarray expression data.
  • Loi Sy Ho, Jagath C. Rajapakse
    2003 年 14 巻 p. 64-72
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The performance of the ab inito gene prediction approaches mostly depends on the effectiveness of detecting the splice sites. This paper addresses the problem of splice site detection using higher-order Markov models. The tenet of our approach is to brace the higher-order dependencies a Markov model by a neural network that receives the inputs from low-order Markov chains. The method is able not only to capture the higher-order dependencies in the bases of the consensus sequence immediately surrounding the splice site but also to distinguish the characteristics of the coding and non-coding regions on both sides of the splice site. Our experiments indicate that the present method achieves better accuracies over the techniques employing low-order Markov chains and other earlier approaches.
  • An Analysis Using Information Theoretic and Machine Learning Approaches
    Christina L. Zheng, Virginia R. De Sa, Michael Gribskov, T. Murlidhara ...
    2003 年 14 巻 p. 73-83
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.
  • Huiqing Liu, Hao Han, Jinyan Li, Limsoon Wong
    2003 年 14 巻 p. 84-93
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper presents a machine learning method to predict polyadenylation signals (PASes) in human DNA and mRNA sequences by analysing features around them. This method consists of three sequential steps of feature manipulation: generation, selection and integration of features. In the first step, new features are generated using k-gram nucleotide acid or amino acid patterns. In the second step, a number of important features are selected by an entropy-based algorithm. In the third step, support vector machines are employed to recognize true PASes from a large number of candidates. Our study shows that true PASes in DNA and mRNA sequences can be characterized by different features, and also shows that both upstream and downstream sequence elements are important for recognizing PASes from DNA sequences. We tested our method on several public data sets as well as our own extracted data sets. In most cases, we achieved better validation results than those reported previously on the same data sets. The important motifs observed are highly consistent with those reported in literature.
  • Shin Ando, Hitoshi Iba
    2003 年 14 巻 p. 94-103
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper proposes a method to capture the dynamics in gene expression data using S-system formalism and construct genetic network models. Our purposed method exploits the probabilistic heuristic search and divide-and-conquer approach to estimate the network structure. In evaluating the network structure, we attempt a primitive integration of other knowledge to the statistical criterion. The Z-score is used to analyze the robust and significant parameters from stochastic search results. We evaluated the proposed method on artificially generated data and E. coli mRNA expression data.
  • Naoki Hosoyama, Noman Nasimul, Hitoshi Iba
    2003 年 14 巻 p. 104-113
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    In recent years, base sequences have been increasingly unscrambled through attempts represented by the human genome project. Accordingly, the estimation of the genetic network has been accelerated. However, no definitive method has become available for drawing a large effective graph. This paper proposes a method which allows for coping with an increase in the number of nodes by laying out genes on planes of several layers and then overlapping these planes. This layout involves an optimization problem which requires maximizing the fitness function. To demonstrate the effectiveness of our approach, we show some graphs using actual data on 82 genes and 552 genes. We also describe how to lay out nodes by means of stochastic searches, e.g., stochastic hill-climbing and incremental methods. The experimental results show the superiority and usefulness of two search methods in comparison with the simple random search.
  • Jonas S. Almeida, Eberhard O. Voit
    2003 年 14 巻 p. 114-123
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The genomic and post-genomic eras have been blessing us with overwhelming amounts of data that are of increasing quality. The challenge is that most of these data alone are mere snapshots of the functioning organism and do not reveal the organizational structure of which the particular genes and metabolites are contributors. To gain an appreciation of their roles and functions within cells and organisms, genomic and metabolic data need to be integrated in systems models that allow the testing of hypotheses, generate experimentally testable predictions, and ultimately lead to true explanations. One type of data that is particularly well suited for such integration consists of time profiles, which show gene activities, metabolite concentrations, or protein prevalences at dense series of time points. We show with a specific example how such time series can be analyzed and evaluated, if some structural information about the data is available, even if this information is incomplete. The method consists of three components. The first is a particularly suitable mathematical modeling framework, namely Biochemical Systems Theory, in which parameters are direct indicators of the organization of the underlying phenomenon, the second is the training of an artificial neural network for data smoothing and complementation, and the third is a technique for reinterpreting differential equations in a fashion that facilitates parameter estimation. A prototype webtool for these analyses is available at https://bioinformatics.musc.edu/webmetabol/.
  • Sascha Ott, Satoru Miyano
    2003 年 14 巻 p. 124-133
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NP-hard and exhibits a search space of super-exponential size, researchers are using heuristic algorithms for this task. However, little can be said about the accuracy of heuristic estimations. In order to overcome this problem, we present a general approach to reduce the search space to a biologically meaningful subspace and to find optimal solutions within the subspace in linear time. We show the effectiveness of this approach in application to yeast and Bacillus subtilis data.
  • Kiyoko F. Aoki, Atsuko Yamaguchi, Yasushi Okuno, Tatsuya Akutsu, Nobuh ...
    2003 年 14 巻 p. 134-143
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    One aspect of glycome informatics is the analysis of carbohydrate sugar chains, or glycans, whose basic structure is not a sequence, but a tree structure. Although there has been much work in the development of sequence databases and matching algorithms for sequences (for performing queries and analyzing similarity), the more complicated tree structure of glycans does not allow a direct implementation of such a database for glycans, and further, does not allow for the direct application of sequence alignment algorithms for performing searches or analyzing similarity. Therefore, we have utilized a polynomial-time dynamic programming algorithm for solving the maximum common subtree of two trees to implement an accurate and efficient tool for finding and aligning maximally matching glycan trees. The KEGG Glycan database for glycan structures released recently incorporates our tree-structure alignment algorithm with various parameters to adapt to the needs of a variety of users. Because we use similarity scores as opposed to a distance metric, our methods are more readily used to display trees of higher similarity. We present the two methods developed for this purpose and illustrate its validity.
  • Masahiro Hattori, Yasushi Okuno, Susumu Goto, Minoru Kanehisa
    2003 年 14 巻 p. 144-153
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    We have developed an efficient algorithm for comparing two chemical compounds, where the chemical structure is treated as a 2D graph consisting of atoms as vertices and covalent bonds as edges. Based on the concept of functional groups in chemistry, 68 atom types (vertex types) are defined for carbon, nitrogen, oxygen, and other atomic species with different environments, which has enabledd etectiono f biochemicallym eaningfulf eatures.M aximalc ommons ubgraphs of two graphs can be found by searching for maximal cliques in the association graph, and we have introducedh euristicst o acceleratet he clique finding. Our heuristicp rocedurei s controlled by some adjustablep arameters. Herew e appliedo ur proceduret o the latest KEGG/LIGANDd atabase with different sets of parameters, and demonstrated the correlation of parameters in our algorithm with the distribution of similarity scores and/or the execution time. Finally, we showed the effectiveness of our heuristics for compound pairs along metabolic pathways.
  • Yutaka Ueno, Masanori Arita, Toshitaka Kumagai, Kiyoshi Asai
    2003 年 14 巻 p. 154-163
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The data processing language in a graphical software tool that manages sequence annotation data from genome databases should provide flexible functions for the tasks in molecular biology research. Among currently available languages we adopted the Lua programming language. It fulfills our requirements to perform computational tasks for sequence map layouts, i.e. the handling of data containers, symbolic reference to data, and a simple programming syntax. Upon importing a foreign file, the original data are first decomposed in the Lua language while maintaining the original data schema. The converted data are parsed by the Lua interpreter and the contents are stored in our data warehouse. Then, portions of annotations are selected and arranged into our catalog format to be depicted on the sequence map. Our sequence visualization program was successfully implemented, embedding the Lua language for processing of annotation data and layout script. The program is available at http://staff.aist.go.jp/yutaka.ueno/guppy/.
  • Ming Li, Bin Ma, Derek Kisman, John Tromp
    2003 年 14 巻 p. 164-175
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    Extending the single optimized spaced seed of PatternHunter [20] to multiple ones, Pattern-Hunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search technology back to a full circle.
  • Wei Chen, Wing-kin Sung
    2003 年 14 巻 p. 176-185
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    In this paper, we proposed a new type of seed for Blast-like homology search tools called “half seed”. This new seed is better than the “consecutive seed” used by the original Blast tools in both sensitivity and efficiency. When compared with the “gapped seed”, which is proposed together with a new Blast-like searching tool, PatternHunter, this new seed offers a much wider range of choices for performing tradeoff between sensitivity and efficiency. This property is especially useful when some searching applications want to get more precise results with limitation on hardware resources, or vice versa.
  • Design and Analysis of Experiments
    Miklós Csürös, Bingshan Li, Aleksandar Milosavljevic
    2003 年 14 巻 p. 186-195
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    This paper studies sequencing and mapping methods that rely solely on pooling and shotgun sequencing of clones. First, we scrutinize and improve the recently proposed Clone-Array Pooled Shotgun Sequencing (CAPSS) method, which delivers a BAC-linked assembly of a whole genome sequence. Secondly, we introduce a novel physical mapping method, called Clone-Array Pooled Shotgun Mapping (CAPS-MAP), which computes the physical ordering of BACs in a random library. Both CAPSS and CAPS-MAP construct subclone libraries from pooled genomic BAC clones.
  • Tho Hoan Pham, Kenji Satou, Tu Bao Ho
    2003 年 14 巻 p. 196-205
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    Tight turn has long been recognized as one of the three important features of proteins after the a-helix and β-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are β-turns. Analysis and prediction of β-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of β-turns. We have investigated two aspects of applying SVM to the prediction and analysis of β-turns. First, we developed a new SVM method, called BTSVM, which predicts, β-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of, β-turns based on the “multivariable” classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.
  • Aik Choon Tan, David Gilbert, Yves Deville
    2003 年 14 巻 p. 206-217
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources.
  • Minh N. Nguyen, Jagath C. Rajapakse
    2003 年 14 巻 p. 218-227
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The solution of binary classification problems using the Support Vector Machine (SVM) method has been well developed. Though multi-class classification is typically solved by combining several binary classifiers, recently, several multi-class methods that consider all classes at once have been proposed. However, these methods require resolving a much larger optimization problem and are applicable to small datasets. Three methods based on binary classifications: one-against-all (OAA), one-against-one (OAO), and directed acyclic graph (DAG), and two approaches for multi-class problem by solving one single optimization problem, are implemented to predict protein secondary structure. Our experiments indicate that multi-class SVM methods are more suitable for protein secondary structure (PSS) prediction than the other methods, including binary SVMs, because their capacity to solve an optimization problem in one step. Furthermore, in this paper, we argue that it is feasible to extend the prediction accuracy by adding a second-stage multi-class SVM to capture the contextual information among secondary structural elements and thereby further improving the accuracies. We demonstrate that two-stage SVMs perform better than single-stage SVM techniques for PSS prediction using two datasets and report a maximum accuracy of 79.5%.
  • Takashi Ishida, Takeshi Nishimura, Makoto Nozaki, Tsuyoshi Inoue, Tohr ...
    2003 年 14 巻 p. 228-237
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    An ab initio protein structure prediction system called ABLE is described. It is based on the fragment assembly method, which consists of two steps: dividing a target sequence into overlapping subsequences (fragments) of short length and assigning a local structure to each fragment; and generating models by assembling the local structures and selecting the models with low potential energy. One of the most important problems in conventional fragment assembly methods is the difficulty of selecting native-like structures by energy minimization only. ABLE thus employs a structural clustering method to select the native-like models from among the generated models. By applying the unit-vector root mean square distance (URMS) as a measure of structure similarity, we achieve more robust, effective structural clustering. When no enough clusters of good quality are obtained, ABLE runs the energy minimization procedure again by incorporating structural restraint conditions obtained from the consensus substructures in the previously generated models. This approach is based on our observation that there is a high probability that the consensus substructures of the generated models have native-like structures. Another feature of ABLE is that in assigning local structures to fragments, it assigns mainchain dihedral angles (φ, ψ) to the central residue of each fragment according to a probability distribution map built from candidate sequences similar to each fragment. This enables the system to generate appropriate local structures that may not already exist in a protein structure database. We applied our system to 25 small proteins and obtain near-native folds for more than half of them. We also demonstrate the performance of our structural clustering method, which can be applied to other protein structure prediction systems.
  • A Novel Algorithm for Protein-Protein Soft Docking
    Carlos A. Del Carpio Munoz, Tobias Peissker, Atsushi Yoshimori, Eiichi ...
    2003 年 14 巻 p. 238-249
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    We propose a new methodology for “soft” docking unbound protein molecules (reported at the isolated state). The methodology is characterized by its simplicity and easiness of embedment in any rigid body docking process based on point complementarity. It is oriented to allow limited free but not unrealistic interpenetration of the side chains of protein surface amino acid residues. The central step to the technique is a filtering process similar to those in image processing. The methodology assists in deletion of atomic-scale details on the surface of the interacting monomers, leading to the extraction of the most characteristic flattened shape for the molecule as well as the definition of a soft layer of atoms to allow smooth interpenetration of the interacting molecules during the docking process. Although the methodology does not perform structural or conformational rearrangements in the interacting monomers, results output by the algorithm are in fair agreement with the relative position of the monomer in experimentally reported complexes. The algorithm performs especially well in cases where the complexity of the protein surfaces is high, that is in hetero dimmer complex prediction. The algorithm is oriented to play the role of a fast screening engine for proteins known to interact but for which no information other than that of the structures at the isolated state is available. Consequently the importance of the methodology will increase in structural-function studies of thousand of proteins derived from large scale genome sequencing projects being executed all around the globe
  • Dongsoo Han, Hong-Soog Kim, Jungmin Seo, Woohyuk Jang
    2003 年 14 巻 p. 250-259
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    In this paper, we propose a probabilistic framework to predict the interaction probability of proteins. The notion of domain combination and domain combination pair is newly introduced and the prediction model in the framework takes domain combination pair as a basic unit of protein interactions to overcome the limitations of the conventional domain pair based prediction systems. The framework largely consists of prediction preparation and service stages. In the prediction preparation stage, two appearance probability matrices are constructed. Each matrix holds information on appearance frequencies of domain combination pairs in the interacting and non-interacting sets of protein pairs, respectively. Based on the appearance probability matrix, a probability equation is devised. The equation maps a protein pair to a real number in the range of 0 to 1. Two distributions of interacting and non-interacting sets of protein pairs are obtained using the equation. In the prediction service stage, the interaction probability of a protein pair is predicted using the distributions and the equation. The validity of the prediction model is evaluated for the interacting set of protein pairs in a Yeast organism and artificially generated noninteracting set of protein pairs. When 80% of the set of interacting protein pairs in DIP (Database of Interacting Proteins) is used as a learning set of interacting protein pairs, very high sensitivity (86%) and moderate specificity (56%) are achieved within our framework.
  • Hirotugu Akaike
    2003 年 14 巻 p. 263-265
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
    The role of a model is to provide adequate knowledge to handle a particular problem. The work of modeling starts on the basis of the feel and knowledge of the object and proceeds by developing guesses about the structure of the object. In this paper characteristics of this process are demonstrated with the example of the analysis of the golf swing motion.
  • Michael Zuker
    2003 年 14 巻 p. 266-268
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Charles De Lisi
    2003 年 14 巻 p. 269
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Hiroyuki Kurata, Rei Iwasaki, Kouichi Masaki, Takayuki Tanaka, Kouji M ...
    2003 年 14 巻 p. 270-271
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Hiroyuki Honda, Takeshi Kobayashi
    2003 年 14 巻 p. 272-273
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Ji-Hung Kim, Kyung-Shin Lee, Pan-Gyu Kim, Hwan-Gue Cho
    2003 年 14 巻 p. 274-275
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Lars Martin Jakt, Mitsuhiro Okada, Shin-Ichi Nishikawa
    2003 年 14 巻 p. 276-277
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Katsuhisa Horimoto, Hiroyuki Toh, Sachiyo Aburatani, Nobuyoshi Sugaya, ...
    2003 年 14 巻 p. 278-279
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Kirill Kryukov, Naruya Saitou
    2003 年 14 巻 p. 280-281
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Masahiko Nakatsui, Takanori Ueda, Masahiro Okamoto
    2003 年 14 巻 p. 282-283
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Automatical Extraction of Disease-Associated Knowledge
    Masafumi Ohtsubo, Susumu Mitsuyama, Takashi Kawamura, Nobuyoshi Shimiz ...
    2003 年 14 巻 p. 284-285
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Motoi Tobita, Ken Horiuchi, Kenji Araki, Masashi Nemoto, Tetsuo Nishik ...
    2003 年 14 巻 p. 286-287
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Takashi Yamazaki, Ariya Fujita, Iriko Kaneko, Yoshinari Fukui, Toshika ...
    2003 年 14 巻 p. 288-289
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Ryo Hattori, Kazuharu Arakawa, Hayataro Kouchi, Masaru Tomita
    2003 年 14 巻 p. 290-291
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Tomokazu Konishi, Masanori Yoshida, Kenya Shibahara
    2003 年 14 巻 p. 292-293
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Kouichi Takahashi, Takeshi Sakurada, Kazunari Kaizu, Tomoya Kitayama, ...
    2003 年 14 巻 p. 294-295
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Masao Nagasaki, Atsushi Doi, Kazuko Ueno, Eri Torikai, Hiroshi Matsuno ...
    2003 年 14 巻 p. 296-297
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Michiel J. L. De Hoon, Brad Chapman, Iddo Friedberg
    2003 年 14 巻 p. 298-299
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Masanori Arita
    2003 年 14 巻 p. 300-301
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Koji Ota, Takuji Yamada, Yoshihiro Yamanishi, Susumu Goto, Minoru Kane ...
    2003 年 14 巻 p. 302-303
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Shin Kawano, Yasushi Okuno, Kosuke Hashimoto, Harumi Yamamoto, Hiromu ...
    2003 年 14 巻 p. 304-305
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Sachiyo Aburatani, Nobuyoshi Sugaya, Hiroo Murakami, Makihiko Sato, Ka ...
    2003 年 14 巻 p. 306-307
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Koji Kadota, Katsutoshi Takahashi
    2003 年 14 巻 p. 308-309
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Hitoshi Shimizu, Shigeyuki Oba, Shin Ishii
    2003 年 14 巻 p. 310-311
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
  • Takuya Oyama, Mikio Yoshida, Satoshi Kamegai, Kagehiko Kitano, Fumihit ...
    2003 年 14 巻 p. 312-313
    発行日: 2003年
    公開日: 2011/07/11
    ジャーナル フリー
feedback
Top