IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
3 巻
選択された号の論文の11件中1~11を表示しています
  • Tatsuya Akutsu
    原稿種別: Editorial
    専門分野: Editorial
    2010 年 3 巻 p. 1
    発行日: 2010年
    公開日: 2010/02/04
    ジャーナル フリー
  • Ryoichi Minai, Yo Matsuo
    原稿種別: Database/Software Papers
    専門分野: Database/Software Paper
    2010 年 3 巻 p. 2-9
    発行日: 2010年
    公開日: 2010/02/04
    ジャーナル フリー
    We have developed an alignment tool for comparing protein local surfaces (AltPS). This program enables efficient exhaustive searches of the entire protein surfaces, using a feature vector for a surface atom with 6 to 18 elements to describe the geometrical and physicochemical properties in the local environment, without referring sequence or fold homology. AltPS runs on a personal computer with the input of a pair of PDB coordinates and outputs similarity scores between identified similar surfaces, alignments of the surface atoms, and corresponding superposed coordinates, based on cluster analysis of similar surface regions. In this report, we present some results on the application of AltPS to several protein pairs with similar functions to identify similar functional sites. AltPS can be downloaded from http://d-search.atnifty.com/research.html
  • Tatsuya Yoshikawa, Shigeto Seno, Yoichi Takenaka, Hideo Matsuda
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 10-23
    発行日: 2010年
    公開日: 2010/03/08
    ジャーナル フリー
    To identify protein-protein interaction pairs with high accuracy, we propose a method for predicting these interactions based on characteristics obtained from protein-protein docking evaluations. Previous studies assumed that the required protein affinity strength for an interaction was not dependent on protein functions. However, the protein affinity strength appears to differ with different docking schemes, such as rigid-body or flexible docking, and these schemes may be related to protein functions. Thus, we propose a new scoring system that is based on statistical analysis of affinity score distributions sampled by their protein functions. As a result, of all possible protein pair combinations, a newly developed method improved prediction accuracy of F-measures. In particular, for bound antibody-antigen pairs, we obtained 50.0% recall (=sensitivity) with higher F-measures compared with previous studies. In addition, by combining two proposed scoring systems, Receptor-Focused Z-scoring and Ligand-Focused Z-scoring, further improvement was achieved. This result suggested that the proposed prediction method improved the prediction accuracy (i.e., F-measure), with few false positives, by taking biological functions of protein pairs into consideration.
  • Yusuke Kitamura, Tomomi Kimiwada, Jun Maruyama, Takashi Kaburagi, Taka ...
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 24-39
    発行日: 2010年
    公開日: 2010/03/15
    ジャーナル フリー
    A Monte Carlo based algorithm is proposed to predict gene regulatory network structure of mouse nuclear receptor superfamily, about which little is known although those genes are believed to be related with several difficult diseases. The gene expression data is regarded as sample vector trajectories from a stochastic dynamical system on a graph. The problem is formulated within a Bayesian framework where the graph prior distribution is assumed to follow a Zipf distribution. Appropriateness of a graph is evaluated by the graph posterior mean. The algorithm is implemented with the Exchange Monte Carlo method. After validation against synthesized data, an attempt is made to use the algorithm for predicting network structure of the target, the mouse nuclear receptor superfamily. Several remarks are made on the feasibility of the predicted network from a biological viewpoint.
  • Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, M. Michael G ...
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 40-53
    発行日: 2010年
    公開日: 2010/05/12
    ジャーナル フリー
    As a fundamental biological problem, revealing the protein folding mechanism remains to be one of the most challenging problems in structural bioinformatics. Prediction of protein folding rate is an important step towards our further understanding of the protein folding mechanism and the complex sequence-structure-function relationship. In this article, we develop a novel approach to predict protein folding rates for two-state and multi-state protein folding kinetics, which combines a variety of structural topology and complex network properties that are calculated from protein three-dimensional structures. To take into account the specific correlations between network properties and protein folding rates, we define two different protein residue contact networks, based on two different scales Protein Contact Network (PCN) and Long-range Interaction Network (LIN) to characterize the corresponding network features. The leave-one-out cross-validation (LOOCV) tests indicate that this integrative strategy is more powerful in predicting the folding rates from 3D structures, with the Pearson's Correlation Coefficient (CC) of 0.88, 0.90 and 0.90 for two-state, multi-state and combined protein folding kinetics, which provides an improved performance compared with other prediction work. This study provides useful insights which shed light on the network organization of interacting residues underlying protein folding process for both two-state and multi-state folding kinetics. Moreover, our method also provides a complementary approach to the current folding rate prediction algorithms and can be used as a powerful tool for the characterization of the foldomics protein data. The implemented webserver (termed PRORATE) is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/folding/.
  • Mari Pritchard
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 54-61
    発行日: 2010年
    公開日: 2010/06/17
    ジャーナル フリー
    Gene expression analysis is commonly used to analyze millions of gene expression data points. Challenging in this process has been the development of appropriate statistical methods for high-dimensional data. We propose Sparse Learner Boosting for gene expression data analysis. Boosting is performed to minimize the loss function, although this process can cause overfitting when a large number of variables are present. Ordinary boosting utilizes all of the potential weak learners in a given data set and constructs a decision rule. The fundamental idea of Sparse Learner Boosting is to reduce the complexity of the decision rule by using fewer weak learners than is usually required. This reduction prevents overfitting and improves performance during classification. Numerical studies support this modification for high-dimensional data, such as that obtained from gene expression analysis. We show that the proposed modification improves the performance of ordinary boosting methods.
  • Hiroshi Yoshida, Kinji Kimura, Naoki Yoshida, Junko Tanaka, Yoshihiro ...
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 62-69
    発行日: 2010年
    公開日: 2010/09/14
    ジャーナル フリー
    We sometimes meet an experiment in which its rate constants cannot be determined in this experiment only; in this case, it is called an underdetermined experiment. One of methods to overcome underdetermination is to combine results of multiple experiments. Multiple experiments give rise to a large number of parameters and variables to analyze, and usually even have a complicated solution with multiple solutions, which situation is unknown to us beforehand. These two difficulties: underdetermination and multiple solutions, lead to confusion as to whether rate constants can intrinsically be determined through experiment or not. In order to analyze such experiments, we use ‘prime ideal decomposition’ to decompose a solution into simpler solutions. It is, however, hard to decompose a set of polynomials with a large number of parameters and variables. Exemplifying a bio-imaging problem, we propose one tip and one technique using ‘resultant’ from a biological viewpoint.
  • Yuta Ishikawa, Ichiro Takeuchi
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 70-81
    発行日: 2010年
    公開日: 2010/10/13
    ジャーナル フリー
    Array CGH is a useful technology for detecting copy number aberrations in genome-wide scale. We study the problem of detecting differentially aberrant genomic regions in two or more groups of CGH arrays and estimating the statistical significance of those regions. An important property of array CGH data is that there are spatial correlations among probes, and we need to take this fact into consideration when we develop a computational algorithm for array CGH data analysis. In this paper we first discuss three difficult issues underlying this problem, and then introduce nearest-neighbor multivariate test in order to alleviate these difficulties. Our proposed approach has three advantages. First, it can incorporate the spatial correlation among probes. Second, genomic regions with different sizes can be analyzed in a common ground. And finally, the computational cost can be considerably reduced with the use of a simple trick. We demonstrate the effectiveness of our approach through an application to previously published array CGH data set on 75 malignant lymphoma patients.
  • Kazunori Miyanishi, Tomonobu Ozaki, Takenao Ohkawa
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 82-90
    発行日: 2010年
    公開日: 2010/10/13
    ジャーナル フリー
    A protein expresses various functions by interacting with chemical compounds. Protein function is clarified by protein structure analysis and the obtained knowledge has been stated in a number of documents. Extracting the function information and constructing the database are useful for various application fields such as drug discovery, understanding of life phenomenon, and so on. However, it is impractical to extract the function information manually from a number of documents for constructing the database, which strongly provide motivation to study automatic extraction of the function information. Extraction of protein function information is considered as a classification problem, namely, whether each sentence from the target document includes the function information or not is determined. Typically, in the case of addressing such a classification problem, a classifier is learned using the training data previously given. However, the accuracy is not high when the training data is not large enough. In such a case, we attempt to improve the accuracy of classification by extending the training data. Effective sentences for getting high accuracy are selected from the reference data aside from the training data set, and added to the training data. In order to select such effective sentences, we introduce the reliability of temporary labels assigned to sentences in the reference data. Sentences with low reliability temporary labels are presented to users, assigned true labels as users' feedback, and added to the training data. Additionally, a classifier is learned by the training data with sentences with high reliability temporary labels. By iterating this process, we attempt to improve the accuracy steadily. In the experiment, compared with the related approach, the accuracy is higher when the iteration steps of feedbacks and the number of sentences returned by users' feedback are small. Thus, it is confirmed that the training data is appropriately extended based on users' feedback by the proposed method. In addition, this result serves a purpose of reducing users' load.
  • Po-Ting Lai, Richard Tzong-Han Tsai
    原稿種別: InCoB 2010
    専門分野: Original Paper
    2010 年 3 巻 p. 91-94
    発行日: 2010年
    公開日: 2010/11/11
    ジャーナル フリー
    Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers, also referred to as the interactor normalization task (INT). Our previous INT system won first place in the BioCreAtIvE II.5 INT challenge by exploiting the different characteristics of individual paper sections to guide gene normalization (GN) and using a support-vector-machine (SVM)-based ranking procedure. The best AUC achieved by our original system was 0.435 in the BioCreAtIvE II.5 INT offline challenge. After employing the proposed re-ranking algorithm, we have been able to improve our system's AUC to 0.447. In this paper, we present a new relational re-ranking algorithm that considers the associations among identifiers to further improve INT ranking results.
  • Koji Yahara, Ying Jiang, Takashi Yanagawa
    原稿種別: Original Papers
    専門分野: Original Paper
    2010 年 3 巻 p. 95-107
    発行日: 2010年
    公開日: 2010/12/09
    ジャーナル フリー
    Type III secretion systems (T3SS) deliver bacterial proteins, or “effectors”, into eukaryotic host cells, inducing physiological responses in the hosts. Effector proteins have been considered virulence factors of pathogenic bacteria, but T3SSs have now been found in symbiotic bacteria as well. Whether any physicochemical difference exists between the two types of effectors remains unknown. In this work, we combined computational statistical and machine learning methods to identify features that could be responsible for the difference. For computational statistical method we used generalized Bayesian information criterion and kernel logistic regression, and for machine learning method we used support vector machine. It was clearly shown that differences in amino acid composition exist between pathogenic and symbiotic effector proteins. All identified discriminating features were those of amino acid composition and average residue weight, and their classification performance could be nearly identical to that using all physicochemical features, with sensitivity and specificity of over 80%. Further analysis on the seven discriminating features by graphical modeling revealed three dominant features among them. Moreover, amino acid regions that were distinctive for the seven features were explored by sliding window analysis. This study provides a methodological basis and important insights into the functional differences between pathogenic and symbiotic T3SS effectors.
feedback
Top