IPSJ Transactions on Bioinformatics
Online ISSN : 1882-6679
ISSN-L : 1882-6679
Volume 3
Displaying 1-11 of 11 articles from this issue
  • Tatsuya Akutsu
    Article type: Editorial
    Subject area: Editorial
    2010 Volume 3 Pages 1
    Published: 2010
    Released on J-STAGE: February 04, 2010
    JOURNAL FREE ACCESS
    Download PDF (31K)
  • Ryoichi Minai, Yo Matsuo
    Article type: Database/Software Papers
    Subject area: Database/Software Paper
    2010 Volume 3 Pages 2-9
    Published: 2010
    Released on J-STAGE: February 04, 2010
    JOURNAL FREE ACCESS
    We have developed an alignment tool for comparing protein local surfaces (AltPS). This program enables efficient exhaustive searches of the entire protein surfaces, using a feature vector for a surface atom with 6 to 18 elements to describe the geometrical and physicochemical properties in the local environment, without referring sequence or fold homology. AltPS runs on a personal computer with the input of a pair of PDB coordinates and outputs similarity scores between identified similar surfaces, alignments of the surface atoms, and corresponding superposed coordinates, based on cluster analysis of similar surface regions. In this report, we present some results on the application of AltPS to several protein pairs with similar functions to identify similar functional sites. AltPS can be downloaded from http://d-search.atnifty.com/research.html
    Download PDF (1739K)
  • Tatsuya Yoshikawa, Shigeto Seno, Yoichi Takenaka, Hideo Matsuda
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 10-23
    Published: 2010
    Released on J-STAGE: March 08, 2010
    JOURNAL FREE ACCESS
    To identify protein-protein interaction pairs with high accuracy, we propose a method for predicting these interactions based on characteristics obtained from protein-protein docking evaluations. Previous studies assumed that the required protein affinity strength for an interaction was not dependent on protein functions. However, the protein affinity strength appears to differ with different docking schemes, such as rigid-body or flexible docking, and these schemes may be related to protein functions. Thus, we propose a new scoring system that is based on statistical analysis of affinity score distributions sampled by their protein functions. As a result, of all possible protein pair combinations, a newly developed method improved prediction accuracy of F-measures. In particular, for bound antibody-antigen pairs, we obtained 50.0% recall (=sensitivity) with higher F-measures compared with previous studies. In addition, by combining two proposed scoring systems, Receptor-Focused Z-scoring and Ligand-Focused Z-scoring, further improvement was achieved. This result suggested that the proposed prediction method improved the prediction accuracy (i.e., F-measure), with few false positives, by taking biological functions of protein pairs into consideration.
    Download PDF (663K)
  • Yusuke Kitamura, Tomomi Kimiwada, Jun Maruyama, Takashi Kaburagi, Taka ...
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 24-39
    Published: 2010
    Released on J-STAGE: March 15, 2010
    JOURNAL FREE ACCESS
    A Monte Carlo based algorithm is proposed to predict gene regulatory network structure of mouse nuclear receptor superfamily, about which little is known although those genes are believed to be related with several difficult diseases. The gene expression data is regarded as sample vector trajectories from a stochastic dynamical system on a graph. The problem is formulated within a Bayesian framework where the graph prior distribution is assumed to follow a Zipf distribution. Appropriateness of a graph is evaluated by the graph posterior mean. The algorithm is implemented with the Exchange Monte Carlo method. After validation against synthesized data, an attempt is made to use the algorithm for predicting network structure of the target, the mouse nuclear receptor superfamily. Several remarks are made on the feasibility of the predicted network from a biological viewpoint.
    Download PDF (1302K)
  • Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, M. Michael G ...
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 40-53
    Published: 2010
    Released on J-STAGE: May 12, 2010
    JOURNAL FREE ACCESS
    As a fundamental biological problem, revealing the protein folding mechanism remains to be one of the most challenging problems in structural bioinformatics. Prediction of protein folding rate is an important step towards our further understanding of the protein folding mechanism and the complex sequence-structure-function relationship. In this article, we develop a novel approach to predict protein folding rates for two-state and multi-state protein folding kinetics, which combines a variety of structural topology and complex network properties that are calculated from protein three-dimensional structures. To take into account the specific correlations between network properties and protein folding rates, we define two different protein residue contact networks, based on two different scales Protein Contact Network (PCN) and Long-range Interaction Network (LIN) to characterize the corresponding network features. The leave-one-out cross-validation (LOOCV) tests indicate that this integrative strategy is more powerful in predicting the folding rates from 3D structures, with the Pearson's Correlation Coefficient (CC) of 0.88, 0.90 and 0.90 for two-state, multi-state and combined protein folding kinetics, which provides an improved performance compared with other prediction work. This study provides useful insights which shed light on the network organization of interacting residues underlying protein folding process for both two-state and multi-state folding kinetics. Moreover, our method also provides a complementary approach to the current folding rate prediction algorithms and can be used as a powerful tool for the characterization of the foldomics protein data. The implemented webserver (termed PRORATE) is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/folding/.
    Download PDF (1152K)
  • Mari Pritchard
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 54-61
    Published: 2010
    Released on J-STAGE: June 17, 2010
    JOURNAL FREE ACCESS
    Gene expression analysis is commonly used to analyze millions of gene expression data points. Challenging in this process has been the development of appropriate statistical methods for high-dimensional data. We propose Sparse Learner Boosting for gene expression data analysis. Boosting is performed to minimize the loss function, although this process can cause overfitting when a large number of variables are present. Ordinary boosting utilizes all of the potential weak learners in a given data set and constructs a decision rule. The fundamental idea of Sparse Learner Boosting is to reduce the complexity of the decision rule by using fewer weak learners than is usually required. This reduction prevents overfitting and improves performance during classification. Numerical studies support this modification for high-dimensional data, such as that obtained from gene expression analysis. We show that the proposed modification improves the performance of ordinary boosting methods.
    Download PDF (251K)
  • Hiroshi Yoshida, Kinji Kimura, Naoki Yoshida, Junko Tanaka, Yoshihiro ...
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 62-69
    Published: 2010
    Released on J-STAGE: September 14, 2010
    JOURNAL FREE ACCESS
    We sometimes meet an experiment in which its rate constants cannot be determined in this experiment only; in this case, it is called an underdetermined experiment. One of methods to overcome underdetermination is to combine results of multiple experiments. Multiple experiments give rise to a large number of parameters and variables to analyze, and usually even have a complicated solution with multiple solutions, which situation is unknown to us beforehand. These two difficulties: underdetermination and multiple solutions, lead to confusion as to whether rate constants can intrinsically be determined through experiment or not. In order to analyze such experiments, we use ‘prime ideal decomposition’ to decompose a solution into simpler solutions. It is, however, hard to decompose a set of polynomials with a large number of parameters and variables. Exemplifying a bio-imaging problem, we propose one tip and one technique using ‘resultant’ from a biological viewpoint.
    Download PDF (402K)
  • Yuta Ishikawa, Ichiro Takeuchi
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 70-81
    Published: 2010
    Released on J-STAGE: October 13, 2010
    JOURNAL FREE ACCESS
    Array CGH is a useful technology for detecting copy number aberrations in genome-wide scale. We study the problem of detecting differentially aberrant genomic regions in two or more groups of CGH arrays and estimating the statistical significance of those regions. An important property of array CGH data is that there are spatial correlations among probes, and we need to take this fact into consideration when we develop a computational algorithm for array CGH data analysis. In this paper we first discuss three difficult issues underlying this problem, and then introduce nearest-neighbor multivariate test in order to alleviate these difficulties. Our proposed approach has three advantages. First, it can incorporate the spatial correlation among probes. Second, genomic regions with different sizes can be analyzed in a common ground. And finally, the computational cost can be considerably reduced with the use of a simple trick. We demonstrate the effectiveness of our approach through an application to previously published array CGH data set on 75 malignant lymphoma patients.
    Download PDF (813K)
  • Kazunori Miyanishi, Tomonobu Ozaki, Takenao Ohkawa
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 82-90
    Published: 2010
    Released on J-STAGE: October 13, 2010
    JOURNAL FREE ACCESS
    A protein expresses various functions by interacting with chemical compounds. Protein function is clarified by protein structure analysis and the obtained knowledge has been stated in a number of documents. Extracting the function information and constructing the database are useful for various application fields such as drug discovery, understanding of life phenomenon, and so on. However, it is impractical to extract the function information manually from a number of documents for constructing the database, which strongly provide motivation to study automatic extraction of the function information. Extraction of protein function information is considered as a classification problem, namely, whether each sentence from the target document includes the function information or not is determined. Typically, in the case of addressing such a classification problem, a classifier is learned using the training data previously given. However, the accuracy is not high when the training data is not large enough. In such a case, we attempt to improve the accuracy of classification by extending the training data. Effective sentences for getting high accuracy are selected from the reference data aside from the training data set, and added to the training data. In order to select such effective sentences, we introduce the reliability of temporary labels assigned to sentences in the reference data. Sentences with low reliability temporary labels are presented to users, assigned true labels as users' feedback, and added to the training data. Additionally, a classifier is learned by the training data with sentences with high reliability temporary labels. By iterating this process, we attempt to improve the accuracy steadily. In the experiment, compared with the related approach, the accuracy is higher when the iteration steps of feedbacks and the number of sentences returned by users' feedback are small. Thus, it is confirmed that the training data is appropriately extended based on users' feedback by the proposed method. In addition, this result serves a purpose of reducing users' load.
    Download PDF (720K)
  • Po-Ting Lai, Richard Tzong-Han Tsai
    Article type: InCoB 2010
    Subject area: Original Paper
    2010 Volume 3 Pages 91-94
    Published: 2010
    Released on J-STAGE: November 11, 2010
    JOURNAL FREE ACCESS
    Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers, also referred to as the interactor normalization task (INT). Our previous INT system won first place in the BioCreAtIvE II.5 INT challenge by exploiting the different characteristics of individual paper sections to guide gene normalization (GN) and using a support-vector-machine (SVM)-based ranking procedure. The best AUC achieved by our original system was 0.435 in the BioCreAtIvE II.5 INT offline challenge. After employing the proposed re-ranking algorithm, we have been able to improve our system's AUC to 0.447. In this paper, we present a new relational re-ranking algorithm that considers the associations among identifiers to further improve INT ranking results.
    Download PDF (174K)
  • Koji Yahara, Ying Jiang, Takashi Yanagawa
    Article type: Original Papers
    Subject area: Original Paper
    2010 Volume 3 Pages 95-107
    Published: 2010
    Released on J-STAGE: December 09, 2010
    JOURNAL FREE ACCESS
    Type III secretion systems (T3SS) deliver bacterial proteins, or “effectors”, into eukaryotic host cells, inducing physiological responses in the hosts. Effector proteins have been considered virulence factors of pathogenic bacteria, but T3SSs have now been found in symbiotic bacteria as well. Whether any physicochemical difference exists between the two types of effectors remains unknown. In this work, we combined computational statistical and machine learning methods to identify features that could be responsible for the difference. For computational statistical method we used generalized Bayesian information criterion and kernel logistic regression, and for machine learning method we used support vector machine. It was clearly shown that differences in amino acid composition exist between pathogenic and symbiotic effector proteins. All identified discriminating features were those of amino acid composition and average residue weight, and their classification performance could be nearly identical to that using all physicochemical features, with sensitivity and specificity of over 80%. Further analysis on the seven discriminating features by graphical modeling revealed three dominant features among them. Moreover, amino acid regions that were distinctive for the seven features were explored by sliding window analysis. This study provides a methodological basis and important insights into the functional differences between pathogenic and symbiotic T3SS effectors.
    Download PDF (505K)
feedback
Top