Linked data entity resolution is the detection of instances that reside in different repositories but co-describe the same topic. The quality of the resolution result depends on the appropriateness of the configuration, including the selected matching properties and the similarity measures. Because such configuration details are currently set differently across domains and repositories, a general resolution approach for every repository is necessary. In this paper, we present cLink, a system that can perform entity resolution on any input effectively by using a learning algorithm to find the optimal configuration. Experiments show that cLink achieves high performance even when being given only a small amount of training data. cLink also outperforms recent systems, including the ones that use the supervised learning approach.
2016 The Institute of Electronics, Information and Communication Engineers