Genome Informatics
Online ISSN : 2185-842X
Print ISSN : 0919-9454
ISSN-L : 0919-9454
Finding Functional Features of Proteins using Machine Learning Techniques
Takashi IshikawaShigeki MitakuTakao TeranoTakatsugu HirokawaMakiko SuwaBoon Chieng Seah
Author information
JOURNAL FREE ACCESS

1994 Volume 5 Pages 168-169

Details
Abstract
Protein function prediction from amino-acid sequences is one of the major tasks in genome informatics.To predict protein functions of a given amino-acid sequence, we can use similarities amongfunctions and structural features of amino-acid sequences, i.e., motif and homology. Difficulties of theprevious function prediction methods are caused by the facts that few already known motif have beenfound and that proteins of similar sequence may not have similar functions. A main objective of ourresearch is to facilitate to find functional features of proteins using machine learning techniques.
Our hypothesis for the protein function prediction is that a protein function arises from physicalstructures of the protein. Since the structures of proteins are built with physico-chemical interactionsamong amino-acids, there might exist some features of amino-acid sequences according to the physicochemicalinteractions. We call these features ‘functional features’. We know that there exists electricinteractions among alpha-helices of bacteriorhodopsin from its tertiary structure of the protein andlocalization of polar amino-acids in the structure. If the amino-acids localization of bacteriorhodopsinis closely related to the function of the protein, we can use this functional feature to predict proteinfunction.
To create rules to predict protein functions, we use the three machine learning techniques (Fig.1). The first technique is analogical reasoning to make a assumptions about functional features. Forexample, if there exists localization of polar amino-acids in some proteins, then the localization mightimply relation between the functional features and functions of the protein, using analogical reasoningfrom the fact about bacteriorhodopsin. The second technique is inductive reasoning to generalize thehypothesis made by analogical reasoning. The goal of the inductive reasoning for protein functionprediction is to decide which localization pattern is most useful to classify protein functions. Thethird technique is deductive reasoning to refine the localization pattern into classification rules. Inthe deductive reasoning, knowledge about protein functions and structures are used to make logicaldescription of classification rules.
We have carried out some experiments to implement our idea to find functional features of proteinsusing machine learning techniques. First we have simulated analogical reasoning process tocreate a hypothesis about functional features of bacteriorhodopsin using ABA framework proposedby authors [1]. In the current stage of our research, this analogical reasoning process is executed byhand simulation, but it will be executed on a computer in the next stage. Next we have analyzedthe relation between the functional features and protein protein functions of seven-helices membraneproteins using a cluster analysis method. From this analysis, we have found that amino-acid intervalfrequencies for polar amino-acids is closely related to some function classes of the classified proteins.The feature of the amino-acid interval frequencies is thought to be a representation of the abstractfunctional feature: ‘localization of amino-acids’. From the result of this cluster analysis, we can usethe functional features for the inductive reasoning in the next step.
In the preliminary experiments described above, we have found new functional features to classifyprotein functions from amino-acid sequences. Specifically, these features can discriminate differentfunctions of proteins that have similar amino-acid sequences in homology analysis. Furthermore, thefeatures can recognize same function proteins that have not similar sequences. From these results westate that our idea is useful to predict protein functions. In the next stage of the research, we have aplan to refine classification rules and to integrate three machine learning techniques.
Content from these authors
© Japanese Society for Bioinformatics
Previous article Next article
feedback
Top