In this poster, a technique to rank-order molecules is shown. Being able to rank-order molecules allows to perform classification QSAR. The technique described here needs a single parameter (called kernel bandwidth) to be optimized during model training. In essence, Kernel Density Estimates (KDE) are revisited and used to rank-order molecules. In order to make KDE compute-efficient, two modifications are proposed: i) using ``vanishing kernels''; i.e. kernel functions with a bounded support and ii) using the Tanimoto distance between chemical fingerprints as a radial basis function in order to work in one dimension. We call this modified construction “Vanishing Ranking Kernels”. Equipped with this construction and using two real- world datasets, one from toxicology and one from High Throughput Screening (HTS) experiments, we show that Ranking Kernels can compete in performance with a state of the art deep-learning implementation for molecules. Ranking Kernels are conceptually simple. They only require a single parameter to be optimized and hence are fast to train. Once trained, they also define a Boolean Applicability Domain (AD), for free. In our experiments, this AD allows to screen candidate molecules at least 69% faster compared to not using an AD.
View full abstract