Abstract
In hit-finding processes, more hits and much structure-activity relationship information are required to be extracted through biological assays. For this purpose, we developed the active learning method, which captures many aspects of chemical information. To evaluate our methods, we used the structure-activity data of the compounds which bind to GPCRs (G-protein coupled receptors). In this study, we have used 1,551 compounds which bind to biogenic amine receptors selected from Pharmaprojects database as positives, and 255,440 compounds selected from Pharmaprojects and Available Chemicals Directory reagent database as negatives. We have shown that the active learning is efficient for hit-finding and prediction. Based on our method, we have shown that 11% of biochemical experiments of the whole chemical library are enough for finding 90% of positives, and 32% are enough for finding 99% of positives. On the contrary, for finding 90% (or 99%) of positives, 90% (or 99 %) of biochemical experiments were necessary when random screening methods were conducted, and 22% (or 61%) of biochemical experiments were necessary when similarity-based screening methods were conducted. Using the active learning method, we developed rapid and cost-effective systems for hit-finding.