We have investigated multiple category classification of pharmacological activity classes of chemical compounds from their chemical structures using multi-class support vector machine (SVM). For the input to the SVM, every chemical structure was represented by a multidimensional pattern vector that was obtained by the Topological Fragment Spectra (TFS) method. In this work, we adopted the "one against the rest" method, which combine two-class SVMs to solve the multiclass classification problem. For the computational trial, we employed 98,634 compounds that belong to 100 different activity classes. The data set was divided into three groups: a training set of 59,180 compounds, a validation set of 29,590 compounds, and a test set of 9,864 compounds. The SVM model was trained using the training set and the validation set. For the test set that consists of 100 classes, the best model correctly classified 80.8% of the drugs into their own active classes including the multi-labeled compounds. The resulted classifier would be useful in drug-discovery and also helpful in risk estimation of adverse effects.
抄録全体を表示