2019 Volume E102.D Issue 2 Pages 392-395
Web person search often return web pages related to several distinct namesakes. This paper proposes a new web page model for template-free person data extraction, and uses Dirichlet Process Mixture model to solve name disambiguation. The results show that our method works best on web pages with complex structure.