主催: The Japanese Society for Artificial Intelligence
会議名: 2012年度人工知能学会全国大会(第26回)
回次: 26
開催地: 山口県山口市 山口県教育会館等
開催日: 2012/06/12 - 2012/06/15
Address information is closely linked to people's daily life. People often need to query addresses of shopping malls, schools, and organization, and use the map service of map marking to make sure reality location. MapMarker is a service, which extracts English postal addresses from general web pages and marks them with associated information on map. This paper extends the idea to Chinese postal addresses extraction on the Web and improves the extraction of associated information for each address with hierarchical clustering. We show how to prepare the data for training and conduct full address extraction using both BIEO and IO tagging methods. We compare the difference with and without Yahoo Chinese word segmentation. The results show that Chinese postal addresses can be extracted with high F-measure 0.97 using BIEO tagging without word segmentation since incorrect segmentation can lead to worse labeling of address tokens. Meanwhile, associated information for each address is also identified based on clustering of the addresses into address blocks. The F-measure is improved to 0.92 from 0.90.