2010 Volume 7 Issue 6 Pages 397-402
The first step in membrane protein type prediction is how to represent the sample of a protein. In this work, a protein can be represented by a high-dimension feature vector by using the DC (Dipeptide Composition) method. This extremely high dimensionality of the protein data may increase the computing time and classifier complexity. Thus, a linear dimensionality reduction algorithm NPE (Neighborhood Preserving Embedding) is introduced to extract the indispensable features from the high-dimensional DC space. Based on the reduced low-dimensional features, K-NN (K-nearest neighbor) classifier is introduced to identify the types of membrane proteins. Finally, a very encouraging experimental result is obtained.