Abstract
We discuss applicability domains (ADs) based on ensemble learning in classification and regression analyses. In regression analysis, the AD can be appropriately set, although attention needs to be paid to the bias of predicted values. However, because the AD set in classification analysis is too wide, we propose an AD based on ensemble learning and data density. First, we set a threshold for data density, below which the prediction result of new data is not reliable. Then, only for new data with a data density higher than the threshold, we consider the reliability of the prediction result based on ensemble learning. By analyzing data from numerical simulations, we demonstrate that the ADs based on ensemble learning are too wide. Then, by using quantitative structure-property relationship data and quantitative structure-activity relationship data, we validate our discussion on ADs in classification and regression analyses and confirm that appropriate ADs can be set using the proposed method.